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(54) System and method for measuring and quantizing document quality 



(57) Text, images, and/or graphics of electronic doc- 
uments should be organized and laid out in a two-di- 
mensional format for presentation to the viewer. The 
best such layout depends upon the content present, the 
creator's intent, the output device, and the viewer's in- 
terests. To analyze the qualitative nature of the layout 
in quantifiable terms, the electronic document is meas- 



ure using various quantifiable factors; such as, balance, 
uniformity, white space management, alignment, con- 
sistency, legibility, etc.; that impact a qualitative nature 
of a document. Such quantifiable factors are then used 
to quantize the aesthetics, ease of use, eye-catching 
ability, interest, communicability, comfort, and conven- 
ience of the document. 
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Description 

[0001] The present invention generally relates to the field of document layout, design, and analysis and, more par- 
ticularly, to methods which quantitatively measure a document's quality based on characteristics inherent in the doc- 
5 ument itself. 

[0002] When documents are created, many decisions must be made as to style, content, layout, and the like. The 
text, images, and graphics must be organized and laid out in a two-dimensional format with the intention of providing 
a presentation to the viewer which will capture and preferably maintain their attention for the time sufficient to get the 
intended message across. Different style options are available for the various content elements and choices must be 

10 made. The best choices for style and layout depend upon content, intent, viewer interests, etc. In order to tell if a set 
of choices made as to the look and feel of the final version of the document were good or bad, one might request 
feedback from a set of viewers after viewing the document and compile the feedback into something meaningful from 
which the document's creators or developers can make alterations, changes, or other improvements. This cycle repeats 
until the document's owners are satisfied that the final version achieves the intended result. 

15 [0003] Factors that contribute to the quality and effectiveness of layout and style decisions for a document are the 
handling of groups of content elements as style and layout choices affect groups of content. A group is a collection of 
content elements. Group membership is a property of the logical structure of the document. The neighborhood of 
groups can be considered a layout property. While layout structure often matches the logical structure, there is no 
requirement that it do so. 

20 [0004] Preferably, one would like to have a quantitative measure of various value properties of the document (meas- 
ures of the document "goodness") based on properties inherent in the document itself. In this manner the document 
itself provides a level of quantitative feedback. For instance, one property that developer's would like to be able to 
measure would be how easy it is to use a document. A measure for the ease of use of a document can be used in 
evaluating or making document design decisions. 

25 [0005] One aspect of the ease of use of a document is one's ability to tell which elements belong to a group and 
which do not. The style and layout decisions that are made in the presentation of a document can affect the degree of 
group identity that it conveys. In evaluating a document's design for its ease of use, it is useful to have a measure of 
the degree of group identity. Considerations for ease-of-use with respect to groups include spatial coherence, spatial 
separation, alignment separation, heading separation, background separation, and/or style separation. Measures for 

30 various characteristics of content, feature, and the like could be weighted by intent, relevance, and other parameters 
and these could then be combined to obtain one or more overall measures for the document itself. If one had a method 
for evaluating properties inherent in the document itself then such a measure could be used during the document 
development process to help determine optimal presentation. 

[0006] An aspect of the ease of use of a document is its searchability. Searchability can be defined as the degree 
35 to which the document structurally supports the finding of a desired content element. A document with high searchability 
provides aids that help in finding desired content. In general, a document with high searchability measure is easier to 
use because it is easy to locate the portion of the document containing the information of interest. 
[0007] Another aspect of a document's ease of use is the document's degree of distinguishability. The distinguish- 
ability of content can be defined as the ability to identify one particular content element from another content element 
40 within the document. Distinguishability is important in establishing the context for the information disclosed by the 
element. It can reduce confusion about what that element is and to what group or setting it belongs. It can also aid in 
locating a desired element. The distinguishability of the document elements is therefore a contributing factor to the 
ease of use of the document. 

[0008] Another property that would be desirable to be able to quantitatively measure is the ability of the document 
45 to hold the viewer's attention and interest. While much of the document's ease of use depends upon the actual content 
and its relevance to the viewer, there can also be a contribution from the style with which that content is presented. If 
a measure of the effect of style decisions on ease of use could be defined it could be used in determining a measure 
of optimal presentation. 

[0009] Documents can present content in ways that make it easier to locate individual items. This can be referred 
50 to as 'locateability'. A way to distinguish one content object from another object is to evaluate the target object's locat- 
ability, i.e., how easy it is to find an object within the document. This is a little different from distinguishability, which 
tells how well an item can be differentiated from its neighbors. Structural aids such as layout of tables or bullet lists 
help the document viewer to locate objects. Presenting content in a table allows its location to be identified by row or 
column. The presence of headings for the rows and columns can further increase the ease of locating items. Presenting 
55 content items in a list introduces an ordering that aids in locating them, and the use of list bullets or item numbers aids 
further. Separability and distinguishability contribute to the locatability of an object. 

[0010] Measures for various aspects of content, features, and the like could be weighted by intent, relevance, and 
other parameters and these could then be combined to obtain one or more overall measures for the document itself. 
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If one had a method for evaluating such properties inherent in the document itself then such a measure could be used 
during the document development process to help determine optimal presentation. 

[001 1 ] Therefore, it is desirable to provide a methodology to measure the quality of a document in a quantifiable way. 
Moreover, it is desirable to provide a quantifiable measurement of quality which is useable in evaluating the document 

5 and improving its quality so as to add value to the information being conveyed through the document. 

[0012] A first aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized aesthetic value for the document based on a predeter- 
mined combining function, the predetermined combining function combining the quantized measured predetermined 

10 set of characteristics, the quantized aesthetic value being a measure of quality of the document. 

[0013] A second aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized aesthetic value for the document based on a predetermined 
aesthetic combining function, the predetermined aesthetic combining function combining a predetermined subset of 

15 the quantized measured predetermined set of characteristics; generates a quantized ease of use value for the document 
based on a predetermined ease of use combining function, the predetermined ease of use combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; generates a quantized 
eye-catching ability value for the document based on a predetermined eye-catching ability combining function, the 
predetermined eye-catching ability combining function combining a predetermined subset of the quantized measured 

20 predetermined set of characteristics; generates a quantized interest value for the document based on a predetermined 
interest combining function, the predetermined interest combining function combining a predetermined subset of the 
quantized measured predetermined set of characteristics; generates a quantized convenience value for the document 
based on a predetermined convenience combining function, the predetermined convenience combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; generates a quantized 

25 comfort value for the document based on a predetermined comfort combining function, the predetermined comfort 
combining function combining a predetermined subset of the quantized measured predetermined set of characteristics; 
generates a quantized communicability value for the document based on a predetermined communicability combining 
function, the predetermined communicability combining function combining a predetermined subset of the quantized 
measured predetermined set of characteristics; and generates a quantized quality value for the document based on a 

30 predetermined quality combining function, the predetermined quality combining function combining the generated 
quantized aesthetic value, the generated quantized ease of use value, the generated quantized eye-catching ability 
value, the generated quantized interest value, the generated quantized convenience value, the generated quantized 
comfort value, and the generated quantized communicability value. 

[0014] A third aspect of the present invention is a method for quantifying a measure of quality of a document. The 

35 method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized ease of use value for the document based on a prede- 
termined combining function, the predetermined combining function combining the quantized measured predetermined 
set of characteristics, the quantized ease of use value being a measure of quality of the document. 
[0015] A fourth aspect of the present invention is a method for quantifying a measure of quality of a document. The 

40 method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized eye-catching ability value for the document based on a 
predetermined combining function, the predetermined combining function combining the quantized measured prede- 
termined set of characteristics, the quantized eye-catching ability value being a measure of quality of the document. 
[0016] A fifth aspect of the present invention is a method for quantifying a measure of quality of a document. The 

45 method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized interest value for the document based on a predeter- 
mined combining function, the predetermined combining function combining the quantized measured predetermined 
set of characteristics, the quantized interest value being a measure of quality of the document. 
[0017] A sixth aspect of the present invention is a method for quantifying a measure of quality of a document. The 

50 method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized comfort value for the document based on a predeter- 
mined combining function, the predetermined combining function combining the quantized measured predetermined 
set of characteristics, the quantized comfort value being a measure of quality of the document. 
[0018] A seventh aspect of the present invention is a method for quantifying a measure of quality of a document. 

55 The method measures a predetermined set of characteristics of the document; quantizes the measured predetermined 
set of characteristics of the document; and generates a quantized convenience value for the document based on a 
predetermined combining function, the predetermined combining function combining the quantized measured prede- 
termined set of characteristics, the quantized convenience value being a measure of quality of the document. 
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[0019] Another aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized communicability value for the document based on a 
predetermined combining function, the predetermined combining function combining the quantized measured prede- 
termined set of characteristics, the quantized communicability value being a measure of quality of the document. 
[0020] A further aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized aesthetic value for the document based on a predetermined 
aesthetic combining function, the predetermined aesthetic combining function combining a predetermined subset of 
thequantized measured predetermined set of characteristics; generates aquantized ease of use value for the document 
based on a predetermined ease of use combining function, the predetermined ease of use combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; and generates a quan- 
tized quality value for the document based on a predetermined quality combining function, the predetermined quality 
combining function combining the generated quantized aesthetic value and the generated quantized ease of use value. 
[0021] A tenth aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized aesthetic value for the document based on a predetermined 
aesthetic combining function, the predetermined aesthetic combining function combining a predetermined subset of 
the quantized measured predetermined set of characteristics; generates a quantized ease of use value for the document 
based on a predetermined ease of use combining function, the predetermined ease of use combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; generates a quantized 
eye-catching ability value for the document based on a predetermined eye-catching ability combining function, the 
predetermined eye-catching ability combining function combining a predetermined subset of the quantized measured 
predetermined set of characteristics; and generates a quantized quality value for the document based on a predeter- 
mined quality combining function, the predetermined quality combining function combining the generated quantized 
aesthetic value, the generated quantized ease of use value, and the generated quantized eye-catching ability value. 
[0022] A further aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized eye-catching ability value for the document based on a 
predetermined eye-catching ability combining function, the predetermined eye-catching ability combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; generates a quantized 
ease of use value for the document based on a predetermined ease of use combining function, the predetermined 
ease of use combining function combining a predetermined subset of the quantized measured predetermined set of 
characteristics; and generates a quantized quality value for the document based on a predetermined quality combining 
function, the predetermined quality combining function combining the generated quantized eye-catching ability value 
and the generated quantized ease of use value. 

[0023] A further aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized interest value for the document based on a predetermined 
interest combining function, the predetermined eye-catching ability combining function combining a predetermined 
subset of the quantized measured predetermined set of characteristics; generates a quantized ease of use value for 
the document based on a predetermined ease of use combining function, the predetermined ease of use combining 
function combining a predetermined subset of the quantized measured predetermined set of characteristics; and gen- 
erates a quantized quality value for the document based on a predetermined quality combining function, the predeter- 
mined quality combining function combining the generated quantized interest value and the generated quantized ease 
of use value. 

[0024] Another aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized eye-catching ability value for the document based on a 
predetermined eye-catching ability combining function, the predetermined eye-catching ability combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; generates a quantized 
interest value for the document based on a predetermined interest combining function, the predetermined interest 
combining function combining a predetermined subset of the quantized measured predetermined set of characteristics; 
and generates a quantized quality value for the document based on a predetermined quality combining function, the 
predetermined quality combining function combining the generated quantized eye-catching ability value and the gen- 
erated quantized interest value. 

[0025] Another aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
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of characteristics of the document; and generates a quantized eye-catching ability value for the document based on a 
predetermined eye-catching ability combining function, the predetermined eye-catching ability combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; generates a quantized 
comfort value for the document based on a predetermined comfort combining function, the predetermined comfort 
combining function combining a predetermined subset of the quantized measured predetermined set of characteristics; 
and generates a quantized quality value for the document based on a predetermined quality combining function, the 
predetermined quality combining function combining the generated quantized eye-catching ability value and the gen- 
erated quantized comfort value. 

[0026] A further aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized aesthetics value for the document based on a predetermined 
aesthetics combining function, the predetermined aesthetics combining function combining a predetermined subset of 
the quantized measured predetermined set of characteristics; generates a quantized interest value for the document 
based on a predetermined interest combining function, the predetermined interest combining function combining a 
predetermined subset of the quantized measured predetermined set of characteristics; and generates a quantized 
quality value for the document based on a predetermined quality combining function, the predetermined quality com- 
bining function combining the generated quantized aesthetics value and the generated quantized interest value. 
[0027] A further aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized interest value for the document based on a predetermined 
interest combining function, the predetermined interest combining function combining a predetermined subset of the 
quantized measured predetermined set of characteristics; generates a quantized ease of use value for the document 
based on a predetermined ease of use combining function, the predetermined ease of use combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; and generates a quan- 
tized quality value for the document based on a predetermined quality combining function, the predetermined quality 
combining function combining the generated quantized interest value and the generated quantized ease of use value. 
[0028] Another aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized aesthetics value for the document based on a prede- 
termined aesthetics combining function, the predetermined aesthetics combining function combining a predetermined 
subset of the quantized measured predetermined set of characteristics; generates a quantized communicability value 
for the document based on a predetermined communicability combining function, the predetermined communicability 
combining function combining a predetermined subset of the quantized measured predetermined set of characteristics; 
and generates a quantized quality value for the document based on a predetermined quality combining function, the 
predetermined quality combining function combining the generated quantized aesthetics value and the generated quan- 
tized communicability value. 

[0029] Another aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized communicability value for the document based on a 
predetermined communicability combining function, the predetermined communicability combining function combining 
a predetermined subset of the quantized measured predetermined set of characteristics; generates a quantized ease 
of use value for the document based on a predetermined ease of use combining function, the predetermined ease of 
use combining function combining a predetermined subset of the quantized measured predetermined set of charac- 
teristics; and generates a quantized quality value for the document based on a predetermined quality combining func- 
tion, the predetermined quality combining function combining the generated quantized communicability value and the 
generated quantized ease of use value. 

[0030] A further aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized aesthetics value for the document based on a predetermined 
aesthetics combining function, the predetermined aesthetics combining function combining a predetermined subset of 
the quantized measured predetermined set of characteristics; generates a quantized comfort value for the document 
based on a predetermined comfort combining function, the predetermined communicability combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; and generates a quan- 
tized quality value for the document based on a predetermined quality combining function, the predetermined quality 
combining function combining the generated quantized aesthetics value and the generated quantized comfort value. 
[0031] A further aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; generates a quantized comfort value for the document based on a predetermined 
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comfort combining function, the predetermined comfort combining function combining a predetermined subset of the 
quantized measured predetermined set of characteristics; generates a quantized ease of use value for the document 
based on a predetermined ease of use combining function, the predetermined ease of use combining function com- 
bining a predetermined subset of the quantized measured predetermined set of characteristics; and generates a quan- 
tized quality value for the document based on a predetermined quality combining function, the predetermined quality 
combining function combining the generated quantized comfort value and the generated quantized ease of use value. 
[0032] Another aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized aesthetics value for the document based on a prede- 
termined aesthetics combining function, the predetermined aesthetics combining function combining a predetermined 
subset of the quantized measured predetermined set of characteristics; generates a quantized convenience value for 
the document based on a predetermined convenience combining function, the predetermined convenience combining 
function combining a predetermined subset of the quantized measured predetermined set of characteristics; and gen- 
erates a quantized quality value for the document based on a predetermined quality combining function, the predeter- 
mined quality combining function combining the generated quantized aesthetics value and the generated quantized 
convenience value. 

[0033] Another aspect of the present invention is a method for quantifying a measure of quality of a document. The 
method measures a predetermined set of characteristics of the document; quantizes the measured predetermined set 
of characteristics of the document; and generates a quantized convenience value for the document based on a pre- 
determined convenience combining function, the predetermined convenience combining function combining a prede- 
termined subset of the quantized measured predetermined set of characteristics; generates a quantized ease of use 
value for the document based on a predetermined ease of use combining function, the predetermined ease of use 
combining function combining a predetermined subset of the quantized measured predetermined set of characteristics; 
and generates a quantized quality value for the document based on a predetermined quality combining function, the 
predetermined quality combining function combining the generated quantized convenience value and the generated 
quantized ease of use value. 

Figure 1 is a block diagram illustrating an architectural layout for quantifiably measuring document quality according 
to the concepts of the present invention; 

Figure 2 illustrates a conceptual circuit for quantifiably measuring document quality according to the concepts of 
the present invention; 

Figure 3 illustrates a conceptual circuit for quantifiably measuring document aesthetics according to the concepts 
of the present invention; 

Figures 4 to 7 illustrate examples of visual balance according to the concepts of the present invention; 

Figures 8 and 9 illustrate examples of quantifiably measuring visual balance according to the concepts of the 

present invention; 

Figure 10 illustrates a conceptual circuit for quantifiably measuring visual balance according to the concepts of 
the present invention; 

Figures 11 and 12 illustrate examples of non-uniform distribution of content objects over a page according to the 
concepts of the present invention; 

Figures 13 to 15 illustrate examples of white space fraction according to the concepts of the present invention; 
Figure 16 illustrates an example of trapped white space according to the concepts of the present invention; 
Figures17 to 20 illustrate examples of quantifiably measuring trapped white space according to the concepts of 
the present invention; 

Figure 21 illustrates an example of defining the trapped white space according to the concepts of the present 
invention; 

Figures 22 to 24 illustrate examples of alignment according to the concepts of the present invention; 

Figure 25 illustrates an example of quantifiably measuring and graphically plotting alignment with respect to a left 

edge according to the concepts of the present invention; 

Figure 26 illustrates a conceptual circuit for quantifiably measuring document alignment according to the concepts 
of the present invention; 

Figures 27 to 30 illustrate examples of document regularity according to the concepts of the present invention; 

Figure 31 illustrates an example of page security according to the concepts of the present invention; 

Figure 32 illustrates an example of page proportionality according to the concepts of the present invention; 

Figure 33 illustrates an example of separability according to the concepts of the present invention; 

Figure 34 illustrates an example of group identity according to the concepts of the present invention; 

Figure 35 illustrates a conceptual circuit for quantifiably measuring group ease of use according to the concepts 

of the present invention; 
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Figure 36 illustrates a conceptual circuit for quantifiably measuring effective separation according to the concepts 
of the present invention; 

Figures 37 to 41 illustrate examples of separation according to the concepts of the present invention; 

Figure 42 illustrates a conceptual circuit for quantifiably measuring effective distinguishability according to the 

concepts of the present invention; 

Figure 43 illustrates a conceptual circuit for quantifiably measuring total distinguishability according to the concepts 
of the present invention; 

Figures 44 to 46 illustrate examples of distinguishability according to the concepts of the present invention; 
Figure 47 illustrates a conceptual circuit for quantifiably measuring direct locatability according to the concepts of 
the present invention; 

Figure 48 illustrates a conceptual circuit for quantifiably measuring member locatability according to the concepts 
of the present invention; 

Figures 49 and 50 illustrate examples of locatability according to the concepts of the present invention; 

Figure 51 illustrates a conceptual circuit for quantifiably measuring total locatability according to the concepts of 

the present invention; 

Figure 52 illustrates a conceptual circuit for quantifiably measuring group identity according to the concepts of the 
present invention; 

Figures 53 and 54 illustrate examples of coherence according to the concepts of the present invention; 
Figure 55 illustrates examples of group boundary area according to the concepts of the present invention; 
Figures 56 and 57 illustrate examples of style according to the concepts of the present invention; 
Figure 58 illustrates a conceptual circuit for quantifiably measuring eye catching ability according to the concepts 
of the present invention; 

Figure 59 illustrates an example of a color gamut according to the concepts of the present invention; 

Figure 60 illustrates an example of a hue angle according to the concepts of the present invention; 

Figure 61 illustrates a conceptual circuit for quantifiably measuring interest according to the concepts of the present 

invention; 

Figure 62 illustrates an example of variety according to the concepts of the present invention; 

Figure 63 illustrates an example of change rate according to the concepts of the present invention; 

Figure 64 illustrates an example of graphic fraction according to the concepts of the present invention; 

Figure 65 illustrates a conceptual circuit for quantifiably measuring communicability according to the concepts of 

the present invention; 

Figure 66 illustrates a conceptual circuit for quantifiably measuring legibility according to the concepts of the present 
invention; 

Figure 67 illustrates a conceptual circuit for quantifiably measuring decipherability according to the concepts of 
the present invention; 

Figure 68 illustrates an example of line retrace according to the concepts of the present invention; 

Figure 69 illustrates an example of line separation according to the concepts of the present invention; 

Figures 70 to 73 illustrate examples of quadding according to the concepts of the present invention; 

Figure 74 illustrates a conceptual circuit for quantifiably measuring technical level according to the concepts of the 

present invention; 

Figures 75 to 77 illustrate examples of image balance according to the concepts of the present invention; 
Figure 78 illustrates a conceptual circuit for quantifiably measuring ease of progression according to the concepts 
of the present invention; 

Figure 79 illustrates an example of consistency of scan according to the concepts of the present invention; 
Figure 80 illustrates an example of consistency of order according to the concepts of the present invention; 
Figure 81 illustrates a conceptual circuit for quantifiably measuring ease of navigation according to the concepts 
of the present invention; 

Figure 82 illustrates a conceptual circuit for quantifiably measuring comfort according to the concepts of the present 
invention; 

Figure 83 illustrates a conceptual circuit for quantifiably measuring neatness according to the concepts of the 
present invention; 

Figures 84 and 85 illustrate examples of neatness according to the concepts of the present invention; 

Figure 86 illustrates a conceptual circuit for quantifiably measuring intimidation according to the concepts of the 

present invention; 

Figure 87 illustrates an example of intimidation according to the concepts of the present invention; 

Figures 88 and 89 illustrate examples of luminance according to the concepts of the present invention; 

Figures 90 and 91 illustrate examples of size according to the concepts of the present invention; 

Figure 92 illustrates a conceptual circuit for quantifiably measuring convenience according to the concepts of the 
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present invention; 

Figure 93 illustrates a conceptual circuit for quantifiably measuring consistency of position according to the con- 
cepts of the present invention; 

Figure 94 illustrates a conceptual circuit for quantifiably measuring consistency according to the concepts of the 
5 present invention; 

Figure 95 illustrates a definable window for quantifiably measuring the various quality characteristics of a document 

according to the concepts of the present invention; and 

Figure 96 illustrates color dissonance as a function of hue difference. 

w [0034] The present invention is directed to various methods for quantifying various document properties to assist 
document developers in determining document quality. Quality can have several competing aspects and the overall 
quality can depend not only on the absolute properties of the document, but also on the relative importance of these 
properties to the beholder. One aspect or class of document quality is its aesthetics, which is its beauty, the degree to 
which pleasure can be derived from its appearance. Often this property is manifested in the degree of displeasure 

15 generated by an ugly layout. 

[0035] Another aspect or class contributing to the quality of a document is the effectiveness with which it communi- 
cates information to the user. Documents are vessels of information, and the ease at which the viewer can gather and 
understand the information can be an important factor in how well the document does its job. 

[0036] A third aspect or class that contributes to the quality of a document is its ease of use. A factor that contributes 
20 to the ease of use is how convenient the document is, that is, can it be used with a minimum of effort. A second factor 
contributing overall ease of use is content grouping. Information often has some logical organization and documents 
can reflect this organization by grouping the content. The effectiveness with which the document coveys this grouping 
and enables the viewer to capitalize on it contributes to the ease of use. 

[0037] A fourth aspect or class that enters into document quality is the degree to which the user is comfortable with 
25 it. Documents that create anxiety are generally not as desirable as those that the viewer finds soothing and familiar. 
[0038] A fifth aspect or class that is an important contributor to the quality of some documents is the degree to which 
they can catch the eye of the viewer. Advertisements for example, strive to capture the attention and not to be easily 
overlooked 

[0039] A sixth aspect or class that is similar is the ability for the document to maintain interest. It is one thing to 

30 capture the attention, but another to hold it and to avoid boredom as the document is used. 

[0040] A seventh aspect or class of quality can be the economy of the document, both to the creator and to the 
viewer. If the other contributors to quality are the same, then a lower cost version of a document is generally considered 
better than a more expensive one. While other factors may also contribute to document quality, the measuring of these 
seven aspects or classes provides a good basis for evaluating document quality. 

35 [0041] The aspects or classes listed as contributing to document quality (with the exception of economy) are usually 
considered soft and ill-defined concepts; however, these properties can be quantified. The method for measuring and 
quantifying these attributes is to first identify document features that contribute to the property. Quantifiable measures 
of the individual features are then devised. And finally, the individual feature values are combined to form an overall 
score for the more abstract property. 

40 [0042] Figure 1 is a block diagram illustrating an architectural layout for quantifiably measuring document quality 
according to the concepts of the present invention. As illustrated in Figure 1 , the quantization of a document's quality 
can be carried out in by a system architecture that includes a memory 91 , a document processor circuit 92, microproc- 
essor 90, user interface 94, and a display 93. The memory 91 may store for processing purposes a portion of a doc- 
ument, a page of the document, a portion of a page of a document, a document, or multiple documents. 

45 [0043] The display 93 may display the document or portion thereof that is being quantized with respect to quality. 
The display 93 may also display the various options that a user can choose though the user interface 94 with respect 
to the classes that the user wishes to quantize or the various parameters that a user can choose though the user 
interface 94, which are to be measured within the chosen quantization class. 

[0044] The quantization architecture of Figure 1 further includes various circuits for measuring/quantizing various 
50 aspects or classes of document quality. These circuits include aesthetics quantizer 10, ease of use quantizer 20, eye 
catching ability quantizer 30, interest quantizer 40, communicability quantizer 50, comfort quantizer 60, convenience 
quantizer 70, and economy quantizer 75. Each of these (except the economy quantizer, for which measures and meth- 
ods are well known) will be discussed in more detail below. 

[0045] On the other hand Figure 2 illustrates a single quality quantizer or combiner 80 that receives measured and/ 
55 or calculated quantized values representing aesthetics, ease of use, eye catching ability, interest, communicability, 
comfort, and/or convenience. Quality quantizer or combiner 80 processes these values based upon a predetermined 
algorithm so as to generate a quality quantization value for the document or portion of the document being analyzed. 
If alternate or additional measures of quality are considered, they would also be combined at combiner 80. 
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[0046] Each value thereof is based on properties inherent in the document itself. The values are individually combined 
into an overall value or score for the document. Other methods for measuring, assigning, or otherwise associating a 
quantifiable value for document quality should be considered within the scope of the present invention; such that the 
present invention is directed to not only in the particular methods put forth, but also in the much broader concept of 
5 determining a value for document quality. 

[0047] In a preferred embodiment of the present invention, each rule is defined to produce a value ranging between 
0 and 1 such that 0 means low value and 1 means high value. This enables quantized quality values to be calculated 
and combined to form the overall document quality measure. 

[0048] If Vj is the value calculated for the i th rule, the document quality measure V Q is formed as a function E of these 
10 contributions such that: V Q = E(V 1? V 2 , ... V N ). The combining function E can be as simple as a weighted average of 
the contributions. However, because any bad contributor can ruin the document quality no matter how good the others 
are, a linear combination is not preferred. 

[0049] An alternative is: V Q = (EWj (Vj)"P )~ 1/ p. In a preferred embodiment, the Wj factors are weights that specify the 
relative importance of each rule and should sum to one. The exponent 'p' introduces a non-linearity that can make one 
15 bad value overwhelm many good ones. The larger the value of the exponent 'p' is, the greater this effect 

[0050] A further alternative is: V Q = (Ew, (d + Vj)"P ) _1/ p - d. The w, factors are weights that specify the relative impor- 
tance of each rule and should sum to one. The exponent 'p' introduces a non-linearity that can make one bad value 
overwhelm many good ones. The parameter d is a number slightly larger than 0. The larger the value of the exponent 
'p' is, the greater this effect. 

20 [0051] Other combining functions are, for example, the product of the contributions. If weighting of the contribution 
is desired, this can be achieved by: V Q = nVj wi '. 

[0052] It is noted that the illustrations show circuits or circuit for the quality quantization process, this process may 
also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 

25 

AESTHETICS 

[0053] For the case of document aesthetics, the methods herein are used to generate quantifiable values for the 
contributing features of: balance, uniformity, white-space fraction, white-space free-flow, alignment, regularity, page 

30 security, and/or aspect ratio (optimal proportionality). As illustrated in Figure 3, a combining circuit 10 (the aesthetics 
quantizer 10 of Figure 1) receives measured and/or calculated quantized values representing balance, uniformity, 
white-space fraction, white-space free-flow, alignment, regularity, page security, and/or aspect ratio (optimal propor- 
tionality) and processes these values based upon a predetermined algorithm so as to generate an aesthetic quantiza- 
tion value for the document or portion of the document being analyzed. 

35 [0054] Each value thereof is based on properties inherent in the document itself. The values are individually combined 
into an overall value or score for the document. Other methods for measuring, assigning, or otherwise associating a 
quantifiable value for document quality should be considered within the scope of the present invention; such that the 
present invention is directed to not only in the particular methods put forth, but also in the much broader concept of 
determining a value for document quality. 

40 [0055] In a preferred embodiment of the present invention, each rule is defined to produce a value ranging between 
0 and 1 such that 0 means low value and 1 means high value. This enables quantized quality values to be calculated 
and combined to form the overall document quality measure. 

[0056] If Vj is the value calculated for the i th rule, the document quality measure V A is formed as a function E of these 
contributions such that: V A = E(V 15 V 2 , ... V N ). The combining function E can be as simple as a weighted average of 
45 the contributions. However, because any bad contributor can ruin the document quality no matter how good the others 
are, a linear combination is not preferred. 

[0057] An alternative is: V A = (EWj (Vj)~P ) _1/ p. In a preferred embodiment, the Wj factors are weights that specify the 
relative importance of each rule and should sum to one. The exponent 'p' introduces a non-linearity that can make one 
bad value overwhelm many good ones. The larger the value of the exponent 'p' is, the greater this effect 
50 [0058] A further alternative is: V A = (EWj (d + V|) _ p)" 1/ p - d. The Wj factors are weights that specify the relative importance 
of each rule and should sum to one. The exponent 'p' introduces a non-linearity that can make one bad value overwhelm 
many good ones. The parameter d is a number slightly larger than 0. The larger the value of the exponent 'p' is, the 
greater this effect. 

[0059] Other combining functions are, for example, the product of the contributions. If weighting of the contribution 
55 is desired, this can be achieved by: V A = nVj wi '. 

[0060] It is noted that the illustrations show circuits or circuit for the aesthetics quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 
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[0061 ] As illustrated in Figure 3, one of the parameters or factors used in determining aesthetics is the measurement 
and quantization of the document's balance or balance in page layout. 

[0062] In a preferred embodiment of the present invention, there are at least two primary ways of defining balance. 
There is an overall balance where the center of visual weight is at the visual center of a page of a document; as 
5 illustrated by Figure 5 with objects 1 1 0 on document 1 00 having substantially a center of visual weight equal to a visual 
center of a page of a document; and a left-right balance; as illustrated by Figures 6 and 7 with objects 1 10 on document 
100 where the weight of object 110 on the left side of the page is matched by the weight of object 110 at the same 
vertical position on the right side of the page. Other definitions for balance are to be considered within the scope of 
the present invention. 

w [0063] The overall balance is calculated by determining the center of visual weight 102 of Figure 4 and noting how 
much it differs from the visual center of the page 101 of Figure 4. Figure 8 provides a detail example of determining 
the overall balance of a page of a document. 

[0064] As illustrated in Figure 8, if the visual weight of an object i (110 of Figure 8) is Mj (115 of Figure 8) and the 
object's center is positioned at (Xj, Vj), the center of visual weight for the page layout 116 is at (x m , y m ) where x m = 
15 (EXjMj)/(EMj) and y m = (£yjMj)/(£Mj) are the sums of all objects on the page. Objects 110, as used herein, may refer to 
paragraphs, pictures, graphics, etc. 

[0065] If the visual center of the page 1 1 6 is at (x c , y c ) and the maximum x and y distances (117 shows the x distance) 
an object can be from the visual center 102 are d x and d y , a balance value can be calculated as: V OB = H(((x m -x c )/ 

dx) 2 +((ym-y c )/d y ) 2 )/2] 1/2 , 

20 [0066] Note that one can, in a similar way, compute the balance of subclasses of objects by considering only objects 
belonging to the subclasses. For example, one could compute the visual balance of all pictorial images on the page, 
or the visual balance of all text blocks. 

[0067] For left-right balance, the center of visual weight (1 18 of Figure 9) for the x component is calculated as given 
above. However, for the y component, what is desired is that the left and right halves have the same position, rather 
25 than the total being centered. This is achieved by calculating the center of weight for the leftside (118) as: y L = (LyjMj) 
/(EMj) where the sums are over the portions of objects 110 with Xj < x c . Similarly, y R = (EyjMj)/(EMj) where the sums 
are over the portions of objects with x ; > x c . 

[0068] If a content object spans both the left and right sides of the page, for the purposes of this calculation, the 
object is divided along the vertical centerline of the page. The left and right divisions of the object are then entered into 
30 the left and right sums, respectively. If the page height is d h , a left-right balance value is: V LR = 1 - [(((x m - x c )/d x ) 2 + 
((y L - y R )/d h ) 2 )/2] 1/2 . It is noted that other definitions are possible. 

[0069] One might, for example, raise these balance values to powers in order to express the idea that balance is 
non-linear. Ideally, one would perform the psychophysical experiments to measure human response to balance and 
define a function that matches that response. 
35 [0070] The above expressions make use of the visual weight of an object. To first order, this can be defined as the 
objects area times its optical density. However, other psychological effects can also be included. Examples include 
color carrying more weight than gray; round shapes carrying more weight than rectangular, and positioning at the top 
of the page giving more weight than at the bottom. 

[0071] As illustrated in Figure 4, balance is defined with respect to the visual center of the page 101. The visual 
40 center 101 lies halfway between the left and right edges of the page, but it is not halfway between the top and bottom. 
Typically, the visual center 101 is taken to be offset a twentieth of the page height towards the top from the geometric 
center 102. 

[0072] The balance, as illustrated in Figure 1 0, is considered a combination of two approaches described above. In 
Figure 1 0, the quantized overall balance value is derived by a combining of the overall balance and the left-right balance 
45 using a balance quantizer or combiner circuit 1 1 . 

[0073] One approach is: V b ,= 1 - [w ob (1 - V ob )-q + w LR (1 - V LR ) _c i ] _1/c i. The weights w ob and w LR give the relative 
importance of the two balance approaches and should sum to 1. If either of the balance measures is near 1 (good), 
the overall result is also near 1. The exponent 'q' determines how strong this behavior is. 

[0074] It is noted that the illustration shows a circuit for the balance quantization process, this process may also be 
50 performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, but 
any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0075] As illustrated in Figures 11 and 12, another parameter or factor used in determining aesthetics is the meas- 
urement and quantization of the document's uniformity. 

[0076] In a preferred embodiment of the present invention, it is preferred to have content objects 110 distributed 
55 uniformly over a page 100, as illustrated in Figure 12, and not clumped together, as illustrated in Figure 11. However, 
for other values, such as attention grabbing, it may be beneficial to have clustered and even unbalanced positioning. 
Uniformity is preferred. 

[0077] Non-uniformity is defined herein as the variance of the visual density. For a portion of a page, a visual density 
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is determined by it's the visual weight of the objects contained within the portion, divided by it's the portion's area such 
that: Dj = EMj/Aj where the sum is over objects j contained in page portion i. Densities are preferably scaled to range 
between 0 and 1 . A rescale may be needed if visual weight includes factors in addition to the optical density that alter 
the range of values. An average page density can also be defined as the sum of the visual weights for all objects on 
5 the page divided by the imageable area of the page. 

[0078] The imageable area Ap f is typically the area of the page excluding margins. D AV = EM/Ap,. A non-uniformity 
value is calculated by dividing the imageable area into a small number of portions and comparing the visual density 
for portions to the average page density. 

[0079] A non-uniformity value can be calculated as the difference between the visual density for the portion of the 
10 page and the average page density, which is squared and weighted by the portion's area. Subtracting 1 this gives a 
uniformity value. In other words, a non-uniformity value van be defined as V NU = 1 - (E ( D, - D av ) 2 Aj ) / E A,. 
[0080] The average page density can also be calculated for each page individually, or an overall average page 
density can be determined from the visual weight of all objects on portions of all pages and the area of all pages. 
[0081] An alternative to calculating a single non-uniformity value for the document directly is to calculate non-uni- 
15 formity values for individual pages and then combines the page values by some means such as an average, or by a 
non-linear scheme that might, for example, yield a low result if any page has a low value. Other uniformity measures 
are possible, for example, the true variance in the densities can be calculated and used to give non-uniformity. Alter- 
natively, a function is constructed from measured human responses to differing uniformities. 

[0082] As illustrated in Figures 1 3 to 1 5, another parameter or factor used in determining aesthetics is the measure- 
20 ment and quantization of the document's white space fraction. 

[0083] In a preferred embodiment of the present invention, a good page design is one with white space (including 
margins) totaling about half of the total page area. The non-white space area can be estimated by totaling the areas 
of the content objects. 

[0084] In Figure 13, the white space fraction, the amount of area not associated with an object 110 on page 100, 
25 totals more than half of the imageable area and thus it is undesirable. In Figure 14, the white space fraction, the amount 
of area not associated with an object 110 on page 100, totals less than half of the imageable area and thus it is also 
undesirable. Lastly, in Figure 15, the white space fraction, the amount of area not associated with an object 110 on 
page 100, totals about half of the imageable area and thus it is optimal. 

[0085] The total object area 110 can be scaled by the total page area A p and the difference between this value and 
30 the desired 50% can be found. Squaring the difference to give a positive number produces a measure of how much 
the layout differs from the 50% rule. Scaling by 4 to get a number ranging between 0 and 1 and then subtracting this 
from 1 gives the white space fraction quantization value. Thus: V ws = 1 - 4((EAj / A p ) - 0.5) 2 . 

[0086] Other measures of the effect of the white space fraction on document aesthetics and on document quality 
are envisioned herein and should be considered within the scope of the present invention, for example, a function of 
35 measured human responses to differing white space fractions. 

[0087] As illustrated in Figures 1 6 to 21 , another parameter or factor used in determining aesthetics is the measure- 
ment and quantization of the document's trapped white space. 

[0088] In a preferred embodiment of the present invention, it is desired that there should not be any large blocks of 
white space trapped, in the middle of the page, by content. The white space should always be connected to the margins. 
40 [0089] To quantize this class of trapped white space, an efficient method of detecting trapped white space is illustrated 
in Figures 16 to 21 and discussed in more detail below. 

[0090] The class of trapped white space is primarily concerned with relatively large blocks of white space. One way 
that efficiency, as used herein, can be improved is by performing a trapped white space analysis at a coarse resolution. 
The approach taken is to determine the area of all white space that can be accessed directly from the margins. This 
45 area then gets added to the area of the content objects (1 1 0 of Figure 1 6) and compared to the area of the page. Any 
difference becomes the amount of trapped white space (120 of Figure 16). 

[0091] To achieve this, four profiles (Figures 17-20) of white space are accessible from the four margins of the doc- 
ument constructed. These profiles are preferably stored in arrays at the coarse resolution. Call the arrays, for example: 
TopProf, BottomProf, LeftProf and RightProf. Elements of the TopProf and BottomProf arrays are initialized to the page 
50 height, while the LeftProf and RightProf arrays are initialized to the page width. 

[0092] Next all content objects 110 are stepped through and for each, their left (Figure 17), right (Figure 18), top 
(Figure 19), and bottom (Figure 20) boundary positions 121 , 122, 123, and 124, respectively, are found. This information 
is used to update the profile arrays. 

[0093] For points from the left to right boundary, the value stored in the TopProf array is compared to the top boundary 
55 and the array value is replaced with the top value if top is smaller. The difference between the bottom boundary and 
the page height is compared to the BottomProf array value and updated with the smaller result. This is captured in the 
following: 
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for ( x = L; x < R; x++) 
{ 

if(T<TopProf[x]) 
TopProf[x] = T; 
if (H - B < BottomProf[x]) 
BottomProf[x] = H - B} ; 

} 

[0094] Here L, R, T, B contain the left, right, top, and bottom boundary positions of the content object respectively, 
and H is the page height. Similar calculations update the LeftProf and RightProf arrays for the content object. 
[0095] Total white space area (125 of Figure 21 ) connected to the page edges can be found by examining the entire 
page and comparing and checking each point position against the profile arrays. A sum of all points that lie between 
a page edge and the corresponding profile boundary is computed. Summing all points in this manner avoids double 
counting of areas where profiles overlap. Pseudo-code to do the computation follows: 

Freeflow = 0; 

for (x = 0; x < W; x++) 

{ 

for (y = 0; y < H; y++) 
{ 

if (x < LeftProf[y] || W - x < RightProfly] 
|| y < TopProf[x] || H - y < BottomProf[x]) 
Freeflow = Freeflow + pixelArea; 

} 

[0096] If the total area covered by the content objects (being careful not to double count areas where objects overlap) 
is ContentArea and area of the page is: PageArea = W * H, the white space free-flow value becomes: V WF = (Freeflow 
+ ContentArea) /PageArea. 

[0097] A white space free-flow measure for the overall document can be defined as an average of the white space 
free-flow for the individual pages. Non-linear combinations are also possible such as taking the root of the average of 
powers of the page values. 

[0098] Other measures of the effect of trapped white space on aesthetics and on document quality are envisioned 
herein and should be considered within the scope of the present invention, for example, a function of measured re- 
sponses to differing degrees of trapped white space. 

[0099] As illustrated in Figures 22 to 24, another parameter or factor used in determining aesthetics is the measure- 
ment and quantization of the document's alignment. 

[0100] In a preferred embodiment of the present invention, it is desirable for the content objects to be displayed in 
an aligned pattern. The alignment might be for all left edges to have the same x value. Alternatively, it might be for all 
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objects to share the same centerline. If right edges are aligned as well as left ones, this is better still. Similarly, rows 
of objects should be vertically aligned. 

[0101] Figure 22 illustrates objects 1 10 on a page 100 that are poorly aligned. On the other hand, Figure 23 illustrates 
objects 110 on a page 100 that are well aligned. 

[0102] A method for calculating an alignment measure, which can be applied to object's left edges, right edges or 
horizontal centerlines, is disclosed. The method also applies to tops, bottoms, and vertical centers. Each application 
yields a different alignment measure. These are then all combined for an overall alignment measure. 
[0103] The alignment measure can be applied to all content objects, or alternatively, can be applied to a restricted 
set of objects such as all objects belonging to a logical group in the document structure. Alignment can also be restricted 
to objects of a given type, such as all paragraphs, or all pictorial images. 

[0104] Each alignment metric may be built on a page basis and provides a quantifiable indication of how well different 
components on the page are aligned. With this approach the individual page alignments can be combined to form an 
alignment measure for the entire document. Alternatively, alignment values can be calculated using document objects 
across multiple pages. When components are aligned well, then the number given by the metric is one. When com- 
ponents are not aligned well, the metric gives a number smaller than one. Advantageously, changing the position of 
the components on the page changes this number in a smooth and continuous way. 

[0105] To achieve this, first, a histogram of edge (or center) position (Figure 25) is created reflecting the distance 
objects 110 on page 100 are from an edge, in the illustration of Figure 24, the edge is the left edge. The histogram is 
preferably created at lower resolutions than the actual positioning. This reduces alignment sensitivity as well as saving 
on memory and computation requirements. 

[0106] If the histogram array is called EdgeCount, and if the edge position for an object is x, and the resolution 
reduction factor is b, for each content object EdgeCount[b*x] += 1 . Strong alignment will result in most positions con- 
tributing to the same histogram element. If one is interested in the alignment of the left edges of objects, the histogram 
is filled using left-edge positions. Alignments for right, top, or bottom edges and center positions are intended to be 
calculated similarly. 

[0107] The alignment measure depends on the distances between neighboring entries in the histogram. The closer 
together the entries are, the higher the score. This dependence must be non-linear. Otherwise, any moving of an object 
closer to its neighbor is canceled by the moving of the object away from its neighbor on the other side. The non-linear 
function used for entries separated by a distance z is: A/(A+z) where A is a constant that controls how fast values fall 
away from 1 as the distance between entries increases. 

[0108] If two edges were aligned and the distance separating them was z=0, this yields 1 . This provides a contribution 
for the strength of the entries at that position. 

[0109] In other words, if a position has n edges contributing, n-1 separations exist between edges of distance zero. 
As such, there should be a contribution of n-1 from an entry count of n as well as the contribution from the separations 
between neighboring entry positions. If the total number of components were NumberOfObjects, the maximum contri- 
bution, if they were all perfectly aligned, would be NumberOfObjects-1 . Divide by this value to normalize the score so 
that the final result ranges between 0 and 1 . 

[0110] The calculation of the alignment is described by the following: 
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while (EdgeCount[i] is 0) 

i = i + 1; 
align = EdgeCount[i] - 1; 
fora = i+l;j<b*W;j++) 
if(EdgeCount[j] is not 0) 
{ align = align + A/(A+j-i) + EdgeCountjj] - 1 ; 

} 

align = align / (NumberOfObjects - 1); 

[0111] The above applies to left edges, right edges and center positions to calculate alignment measures referred 
to as: align L , align R , and align c . The only difference is in which edge values fill the EdgeCount histogram array. The 
alignment measures for the edges and center are combined in a manner similar to that used to combine the previously 
discussed balance measures. Thus: align H = 1 - (w L (1 - alignR + w R (1 - alignR + w c (1 - alignc)^) -1 ^; where w L , 
w R , and w c , are weights of the relative importance of each of the three alignments and the exponent 'q' controls how 
strongly one alignment dominates. 

[0112] In a similar way, alignment measures are calculated for the top, bottom, and vertically centered positions, 
referred to herein as: align T , align B , and align M . These are combined into a vertical alignment measure align v . Advan- 
tageously, one could combine the horizontal and vertical alignments herewith even though both have already contrib- 
uted to a measure of document quality. Thus: V a | H = align H , and V a!V = align v . An overall alignment measure for a page 
can be defined as a weighted sum of the horizontal and vertical contributions: V a! = w v V a!V + (1 - w v ) V a | H . 
[0113] The alignment, as illustrated in Figure 26, is considered a combination of the left alignment, right alignment, 
top alignment, bottom alignment, vertical center alignment, and horizontal center alignment values described above. 
In Figure 26, the quantized alignment value is derived by a combining of the left alignment, right alignment, top align- 
ment, bottom alignment, vertical center alignment, and horizontal center alignment values using an alignment quantizer 
or combiner circuit 12. 

[0114] It is noted that the illustration shows a circuit for the alignment quantization process, this process may also 
be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[01 1 5] An overall document alignment can be formed as a combination of alignment values determined for separate 
pages. Alternatively, an overall document alignment can be calculated by considering all content objects at once without 
separating them according to page. When values from separate pages are combined, an average may be used as the 
combining mechanism, but alternatives are possible. A method of combining that yields a low result if any of the pages 
have low values may be preferred. Techniques such as taking the reciprocal root of the average of reciprocal powers 
are an example of such a combining method. 

[0116] Other measures of the effect of alignment on document aesthetics and on document quality are envisioned 
herein and should be considered within the scope of the present invention, for example, a function of measured re- 
sponses to differing degrees of alignment. 

[01 1 7] As illustrated in Figures 27 to 30, another parameter or factor used in determining aesthetics is the measure- 
ment and quantization of the document's regularity. 

[0118] In a preferred embodiment of the present invention, when multiple alignment positions occur, it is best to 
space those alignment positions in a regular fashion. In other words, it is better if rows and columns of a table have 
relatively the same heights and widths. 

[01 1 9] Figure 27 illustrates an example of low position regularity of objects 1 1 0 on page 1 00, while Figure 28 illustrates 
an example of high position regularity of objects 110 on page 100. Figure 29 illustrates an example of low spacing 
regularity of objects 110 on page 100, while Figure 30 illustrates an example of high spacing regularity of objects 110 
on page 100. 
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[0120] One way to measure regularity is to identify the neighbors of each object (110) and then consider the distance 
between corresponding edges of the object and its neighbors (e.g. the left edge of the object and the left edge of its 
neighbors). But because the identification of neighbors can be expensive, a simpler approximation is often preferred. 
[0121] If it were assumed that the document has been designed such that objects are strongly aligned, there would 
5 be a sharp peak in a histogram of the distances between alignment positions. The alignment positions are the peaks 
identified in the alignment histogram described above. This processing can be extended to capture distances between 
alignment peaks and to store them in a new histogram referred to herein as: SepCount 

w if (EdgeCount [0] > EdgeCount [1 ]) 

{ 

peakCount++; 
SepCount[l]++; 
prevPeak = 0; 

20 } 

else 

prevPeak = -1; 

25 

for (i=l;i<b*W-l;i++) 
30 if (EdgeCount < EdgeCount [i] && EdgeCount [i+1] < 

EdgeCount [i]) 

{ 

peakCount++; 



SepCount[i - prevPeak]++; 
prevPeak = i; 

} 

if (EdgeCount [b*W-l] > EdgeCount [b*W-2]) 
{ 

peakCount++; 
SepCount[i - prevPeak]++; 

} 
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[0122] Once the SepCount histogram has been created, process it in the same way as the EdgeCount histogram 
was processed for alignment with the exception of dividing by peakCount instead ofNumberOfObjects. 

5 while (SepCount [i] is 0) 

i = i + 1; 
preg = SepCount [i] - 1 ; 

10 

for(j=i+l;j<b*W;j++) 
if (SepCount fj] is notO) 
{ 

preg = preg + A/(A+j-i) + SepCount [j] - 1 ; 

20 

} 

preg = preg / (peakCount - 1); 

25 

[0123] This provides a measure of regularity, but it will be dependent on which alignment measure is used in the 
extraction of alignment position separations. While all six alignments can be used and the results combined, the left 
alignment is preferred for determining horizontal regularity and the top alignment is preferred for finding vertical regu- 
larity. 

30 [0124] Advantageously, these regularity measures can be combined into the document quality measure as: V RH and 
V RV where V RH = preg calculated when EdgeCount is filled with left edge positions and V RV = preg calculated when 
EdgeCount is filled with top edge position. An overall position regularity value can be defined as a weighted sum of 
the horizontal and vertical contributions. 

[0125] Other measures of the effect of position regularity on document aesthetics and on document quality are en- 
35 visioned herein and should be considered within the scope of the present invention, for example, a function of measured 
responses to differing position regularities. 

[0126] A uniform separation between objects can also be calculated to determine document quality. This is a measure 
of spacing regularity preferably calculated in a manner similar to alignment and positional regularity. However, in this 
instance, the array of data values corresponding to EdgeCount, contains the histogram of spacing values between 
40 objects. 

[0127] To determine spacing values for horizontal spacing regularity for each object, first determine the closest object 
(if any) that lies to the right and which overlaps in the vertical direction. The spacing then becomes the distance from 
the right edge of the current object and the left edge of that object's neighbor. A similar calculation determines sepa- 
rations for the vertical direction. 

45 [0128] If performance is an issue, an approximation of spacing can be created without the cost of identifying object 
neighbors by examining arrays of edge positions (as were generated for the alignment calculation). For horizontal 
spacing, step through the array of right edge positions. For each position determine the first left edge to the right of 
this location from the left edge array. The separation value becomes the distance between the right and left edge 
positions. To account for the possibility that more than one object may have an edge at these locations, enter into the 

50 histogram the product of the count of edges from the right and left edge histograms at these locations. The sum of 
these products is then used to normalize the final result instead of NumberOfObjects as in the alignment calculation. 
The approximate separation count is then given by: 



55 
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for(i = 0;i<b*W-l;i++) 
if (LeftEdgeCount [i] !=0) 

{ 

j-i+i; 

while (RightEdgeCount[j] = 0) 
{ 

j=j + i; 

} 

totalSepCount += LeftEdgeCount[i] * RightEdgeCount[j]; 
SpacSepCount[j - i] += LeftEdgeCount[i] * RightEdgeCount[j]; 

} 

Here LeftEdgeCount and RightEdgeCount contain the values of the EdgeCount array when filled with left-edge values 
and right-edge values respectively. For vertical separations the calculation is analogous with the use of top and bottom 
edge values. The calculation of the spatial regularity measure would follow as: 

while (SpacSepCount [i] is 0) 

i = i + 1; 
sreg = SpacSepCount [i] - 1 ; 

forG = i+l;j<b*W;j++) 
if (SpacSepCount [j] is not 0) 

{ 

sreg = sreg + A/(A+j-i) + SpacSepCount [j] - 1 ; 

i = j; 

} 

sreg = sreg / (totalSepCount - 1); 

[0129] An approximation of the vertical spacing histogram is determined in the same manner using the top and 
bottom edge-position arrays. Advantageously, regularity measures can be combined into the document quality measure 
as: V SH and V sv where V SH = sreg when SpacSepCount is computed from left and right edges, while V sv = sreg when 
SpacSepCount is computed from top and bottom edges. An overall separation regularity measure can be defined as 
the weighted sum of the horizontal and vertical contributions. 

[0130] Other measures of the effect of spacing regularity on document aesthetics and on document quality are en- 
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visioned herein and should be considered within the scope of the present invention, for example, a function of measured 
responses to differing spacing regularities. 

[0131] As illustrated in Figure 31, another parameter or factor used in determining aesthetics is the measurement 
and quantization of the document's page security. 
5 [0132] In a preferred embodiment of the present invention, it is preferred that small objects 110 not be positioned at 
or near the edge of a page 100 as they appear insecure and could fall off. This is particularly true of objects such as 
page numbers placed outside of the margins. 

[01 33] To quantify the page security of an object, the distance from its center to each of the page edges is determined. 
The distance may be weighted by which edge is used since an object may appear less secure near a bottom edge 

10 than at the top edge. The minimum weighted-distance should be preserved. 

[0134] If the object center is at (x,, y,) and the page size is defined by WxH, for each object, calculate: ps, = min (s L 
x h s T Vj, s R (W - Xj), s B (H - y,)); where s L , s T , s R , and s B are the left, top right and bottom edge weights. An overall 
page security value is defined as the minimum of all the object values for the page PS = min(pSj). Most objects will 
appear fine when there is some threshold distance T beyond which one should get a value of 1 for the property. To 

15 adjust the measure for this behavior, calculate: V ps = min(1 , PS*T _1 ). 

[0135] Other measures of the effect of object position on document aesthetics and on document quality are envi- 
sioned herein and should be considered within the scope of the present invention, for example, a function of measured 
responses to differing positions e.g., insecurity of objects positioned near page edges. 

[0136] As illustrated in Figure 32, another parameter or factor used in determining aesthetics is the measurement 
20 and quantization of the document's optimal proportionality. 

[0137] In a preferred embodiment of the present invention, certain proportions are more pleasing than others. An 
aspect ratio between width and height of R = 2/(1 + V5) = 0.618... is often ideal. The ratio of width and height of the 
content on a page is determined and compared to this ratio. 

[0138] For width and height, the bounding box of the content (1101, 1102, 1103, 1104, and 1105) is preferred. The 
25 bounding box is calculated as follows: Step through the content objects and find the minimum left edge, the maximum 
right edge, and (measuring top down) the minimum top edge and maximum bottom edge. The width is the difference 
between the maximum right edge and minimum left edge. The height is the difference between the minimum top edge 
and maximum bottom edge. 

[0139] Next, determine whether the width or height is the smaller and divide the smaller by the larger to get the 
30 aspect ratio A. The absolute difference from the ideal ratio R and scale can be determined to get a number between 
0 and 1 as follows: Var = 1 -| A - R |/R. 

[0140] In Figure32, object1101 hasagood proportionality or aspect ratio, while object 11 02 has a poor proportionality 
or aspect ratio. 

[0141] Other measures of the effect of aspect ratio on document aesthetics and on document quality are envisioned 
35 herein and should be considered within the scope of the present invention, for example, a function from measured 
human responses to differing aspect ratios. 

[0142] Other quantifiable features that contribute to the aesthetics of a document and thereby to the document quality 
are possible. The particular embodiments describe here are meant to illustrate how a quantifiable aesthetic measure 
can be constructed and how either directly, or through the aesthetics, they contribute to document quality. Their iden- 
40 tification should not rule out the use of other features as appropriate. 

EASE OF USE 

[0143] For the case of document ease of use, the methods herein are used to generate quantifiable values for the 
45 contributing features of: separability, distinguishability, locatablility, searchability, and/or group identity. As illustrated in 
Figure 35, a combining circuit 20 (the ease of use quantizer 20 of Figure 1) receives measured and/or calculated 
quantized values representing separability, distinguishability, locatablility, searchability, and/or group identity and proc- 
esses these values based upon a predetermined algorithm so as to generate an ease of use quantization value for 
the document or portion of the document being analyzed. 
50 [0144] Each value thereof is based on properties inherent in the document itself. The values are individually combined 
into an overall value or score for the document. Other methods for measuring, assigning, or otherwise associating a 
quantifiable value for document quality should be considered within the scope of the present invention; such that the 
present invention is directed to not only in the particular methods put forth, but also in the much broader concept of 
determining a value for document quality. 
55 [0145] In a preferred embodiment of the present invention, each rule is defined to produce a value ranging between 
0 and 1 such that 0 means low value and 1 means high value. This enables quantized quality values to be calculated 
and combined to form the overall document quality measure. 

[0146] If Vj is the value calculated for the i th rule, the document quality measure V A is formed as a function E of these 



18 



EP 1 503 336 A2 



contributions such that: V EU = E(V 15 V 2 , ... V N ). The combining function E can be as simple as a weighted average of 
the contributions. However, because any bad contributor can ruin the document quality no matter how good the others 
are, a linear combination is not preferred. 

[0147] An alternative is: V EU = (Ew, (Vj) - P)- 1/ P. In a preferred embodiment, the w, factors are weights that specify the 
5 relative importance of each rule and should sum to one. The exponent 'p' introduces a non-linearity that can make one 
bad value overwhelm many good ones. The larger the value of the exponent 'p' is, the greater this effect A further 
alternative is: V EU = (EWj (d + V|)-p)- 1/ p - d. The W| factors are weights that specify the relative importance of each rule 
and should sum to one. The exponent 'p' introduces a non-linearity that can make one bad value overwhelm many 
good ones. The parameter d is a number slightly larger than 0. The larger the value of the exponent 'p' is, the greater 
10 this effect. 

[0148] Other combining functions are, for example, the product of the contributions. If weighting of the contribution 
is desired, this can be achieved by: V EU = nVj wi '. 

[0149] It is noted that the illustrations show circuits or circuit for the ease-of-use quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
15 circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0150] As with the measurement of aesthetics, the measurement of ease of use requires the identification of quan- 
tifiable features that contribute to the ease of use. Examples of methods to measure and combine such features are 
provided. 

[0151] The features first considered are those that relate to the logical structure of the document, that is, to the 
20 organization of the document content into group. In evaluating document quality, content objects of interest need to 
be identified as to what kind of content these objects are, (e.g., images, paragraphs, headings, titles, blocks, borders, 
lists, tables, etc.). This of course will be highly dependent upon the kind of document the document's creator or devel- 
oper either envisions, or is creating, or has already created. 

[0152] Once the document content of interest has been identified, content needs to be characterized, as illustrated 
25 in Figure 33, as to how content is intended to be grouped such that content can be distinguished from other content, 
from other content groups, from other content group members (1104, 1 105, 1 106) or elements, and from neighboring 
content (1101, 1102, 1103). This can be effectuated by parsing content objects of interest into a tree structure of 
content, as illustrated in Figure 34, wherein nodes 135 of the content tree are content groups (i.e., lists, tables, etc.) 
and leaves of the branches 130 of the content tree are content elements (i.e., paragraphs, images, and the like). It 
30 should be understood that one skilled in this art would readily understand the creating of content trees, branches, 
nodes, etc., along with how to traverse the tree preferably in a computer science context. 

[0153] Once a content tree has been created, content which is neighboring the content object(s) of interest need to 
be identified. One procedure takes the content tree and traverses up the tree and identifies neighboring branches 
thereof. Another then moves down the content tree examining elements on the identified neighboring branches. In 

35 such a manner, content neighboring the content of interest can be identified. 

[0154] First a neighbor list associated with content group G is initialized to an empty list. The content tree is traversed 
upward to identify branches neighboring content group G. The content tree is then traversed downward such that 
elements of the identified content branches can be examined. Branches are pruned that are considered to exceed a 
predetermined distance from the node of the group G. Only branches considered as 'nearby' are recursively analyzed. 

40 Although the process described herein involves identifying neighbors N of group G, it should be understood that nothing 
requires group G to actually comprise a group of content as group G can be a single element (paragraphs, images, 
etc.) of content. 

[0155] The procedure lsNeighbor(G,N) is used herein to ascertain whether or not a node N is within a threshold 
distance of content group G, such that node N is to be considered a neighbor N of group G. This can be readily 
45 effectuated by calculating a distance between group G and neighbor N and comparing that distance to a threshold 
variable CloseEnough so as to determine whether Distance(G,N) < CloseEnough. 

[0156] Distance can be the distance between content borders or alternatively the distance between content centers. 
With respect to the former, if the content centers of group G are (x G , y G ) and neighbor N are (x N , y N ) and the widths 
and heights of group G and neighbor N are (w G , h G ) and (w N , h N ) respectively, then distance can be readily computed 
50 by the relationship of: max(abs(x G -x N ) - (w G +w N )/2, 0) + max(abs(y G -y N ) - (h G +h N )/2, 0). More complex distance cal- 
culations such as minimum Euclidean distance between comers can also be used. 

[0157] The threshold CloseEnough can either be a constant or be adjustable with respect to content size. One can 
use the square root of the area of object G to determine a threshold value such that: CloseEnough = (Area(G)) 172 . This 
also can be scaled by factor S where S is typically close to 1 such that: CloseEnough = S* (Area(G)) 172 . 
55 [0158] The methods provided for evaluating distance or determining threshold are not to be considered as limiting 
in scope. Other methods for determining a distance measure for content objects should be considered within the scope 
of the present invention; such that the present invention is directed to the much broader concept of using a measure 
of distance between content objects in the context of evaluating document quality. 
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[0159] The following pseudo-code illustrates how the content tree can be traversed. It should be understood that 
pseudocode provided herein is illustrative, and as such, is intended to be modified by one skilled in the art of computer 
science and programming without undo experimentation to effectuate implementation hereof in one's own system. 
Note that group G is the content currently under examination, C is a node, P is a node, and N is used as a convenience 
index to identify the node being examined. 

TraverseUp(G, C) 
{ 

if node C is the root node then return /* done */ 
P = parent(C) 

for each child node N of parent P 
if child N is different from C then 
TraverseDown(G, N) 
TraverseUp(G, P) 
return 

> 

TraverseDown(G, N) 
{ 

if IsNeighbor(G, N) 

then add node N to the list of neighbors of group G 
otherwise return 
if node N is not a leaf node 

then for each child C of node N 
TraverseDown(G, C) 

return 

} 

[0160] The depth in the tree of neighbor node N relative to content group G can be obtained by adding a depth d 
parameter wherein d+1 is passed in the recursive call to TraverseUp and wherein depth d-1 is passed in the recursive 
call to TraverseDown. The initial value of depth for d would be zero, i.e., TraverseUp(G, G, 0). Depth can be stored 
along with other information on the previously described list of neighbor nodes of group G. 

[0161] Once the document's content has been parsed and neighboring content has been identified for all content 
objects of interest, various properties respecting content separation can then be determined which will be subsequently 
used to quantify document quality. 

[0162] As illustrated in Figures 36 to 41 , another parameter or factor used in determining ease of use is the meas- 
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urement and quantization of the document's separability. 

[0163] In a preferred embodiment of the present invention, a document's degree of overall separability can be as- 
certained by determining the degree of total separability for the document's content objects of interest contained therein. 
Individual measures for content object separation includes: spatial separation (Figure 37), alignment separation (Figure 
5 38), style separation (Figure 39), background separation (Figure 40), and inherent separation (Figure 41), among 
others. 

[0164] A combination of separation measures, as illustrated in Figure 36, for content is then useful in evaluating the 
document content's degree of effective separation of content. Effective separation is useful in evaluating the document 
content's degree of total separation of content, which, in turn, is useful in evaluating the document's degree or measure 

10 of overall separation. Overall separation is subsequently used in assessing document quality. 

[0165] More specifically, the effective separability, as illustrated in Figure 36, is considered a combination of the 
spatial separation, alignmentseparation, style separation, background separation, and/or inherent separation. In Figure 
36, the quantized alignment value is derived by a combining of the spatial separation, alignment separation, style 
separation, background separation, and/or inherent separation using an effective separability quantizer or combiner 

15 circuit 21. 

[0166] It is noted that the illustration shows a circuit for the effective separability quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0167] As illustrated in Figure 37, another parameter or factor used in determining ease of use is the measurement 

20 and quantization of the document's spatial separation. 

[0168] In a preferred embodiment of the present invention, the spatial separation (SpatialSep) for a group or element 
can be the minimum of the separation distance between the group or element and each identified neighbors. Using 
the dimensions of the bounding boxes, (i.e., center position, width, and height) of the content under evaluation, spatial 
separation can be distances between horizontal and vertical components with a floor of zero. This can be further 

25 normalized to yield a value between 0 and 1 by scaling with a maximum separation factor, (e.g., scaling by width (W p ) 
and height (H p ) of the page) such that: (max(abs(x G -x N ) - (w G +w N )/2,0)/W p + max(abs(y G -y N ) - (h G +h N )/2,0)/H p )/2. 
[0169] The particular method provided for evaluating spatial distances between content objects are exemplary and 
are not to be considered as limiting in scope. Other methods should be considered within the scope of the present 
invention, for example, a function of measured human responses to differing spatial separations; such that the present 

30 invention is directed to the much broader concept of using a measure of spatial separation of content objects in a 
determination of total separability in the context of evaluating document quality. 

[0170] As illustrated in Figure 38, another parameter or factor used in determining ease of use is the measurement 
and quantization of the document's alignment separation. 

[0171] In a preferred embodiment of the present invention, alignment separation, as used herein, means that one 
35 or more positions of object G on a particular page matches a corresponding position of neighboring content N. Alignment 
separation is how well content avoids having corresponding positional matches within a page. Using the left, right, top, 
bottom (x G |_, x GR , y GT , y GB ) page position of group G (110) and the (x NL , x NR , y NT , y NB ) page position of neighbor N 
(1101), alignment separation is the minimum of the absolute differences of their corresponding positions, given by: 
min(abs(x GL -x NL ), abs(x GR -x NR ), abs(y GT -y NT ), abs(y GB -y NB )). 
40 [0172] Alignment separation can be further normalized to a value between 0 and 1 by dividing by a maximum possible 
difference in positions (page width W p and page height H p ) of the document page upon which the content resides as 
expressed by: min(abs(x GL -x NL )/W p , abs(x GR - x NR )/W p , abs(y GT -y NT )/H p , abs(y GB -y NB )/H p ). 

[0173] Alternatively, alignment separation can be measured by the sum of the alignment separations between mul- 
tiple edges as given by: min((abs(x GL -x NL ) + abs(x GR -x NR ))/W p , (abs(y GT -y NT ) + abs(y GB -y NB ))/H p ). Alternatively, min 

45 (max(abs(x GL -x NL )/W p , abs(x GR -x NR )/W p ), max(abs(y GT -y NT )/H p , abs(y GB -y NB )/H p )). 

[0174] The methods for evaluating alignment and alignment separation herein are exemplary and are not to be 
considered as limiting in scope. Other methods should be considered within the scope of the present invention, for 
example, a function of measured human responses to differing alignment separation amounts; such that the present 
invention is directed to the much broader concept of using a measure of alignment separation of content objects in a 

50 determination of total separability in the context of evaluating document ease of use and document quality. 

[0175] As illustrated in Figure 39, another parameter or factor used in determining ease of use is the measurement 
and quantization of the document's style separation. 

[0176] In a preferred embodiment of the present invention, style separation (StyleSep) is used herein to provide a 
means by which objects can be further distinguished. To obtain the degree of style separation, content types need to 
55 be compared against every other style type and a value assigned for the amount of style separation therebetween. 
The assignment of such a value would be made as a judgment call by the document developer. For example, one 
document developer may consider it easier to distinguish TEXT from an IMAGE than it is to distinguish a LIST from a 
TABLE. Thus, that developer would assign a much smaller style separation value for types LIST vs. TABLE because 
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it is much more difficult to distinguish between these two types of content. 

[0177] In other words, the degree of style separation is small. Whereas, with regard to the TEXT vs. IMAGE as 
previously mentioned, the developer may consider it much easier to distinguish between these two types of content. 
Thus, separations in style is high so type TEXT vs. type IMAGE would be assigned a high value in the table of style 

5 separations, e.g., TypeSepTable, which is preferably multi-dimensional and indexed by type. 

[0178] The table of style separation values (TypeSepTable) contains a value for all types vs. all other types. For 
instance, content type IMAGE would be assigned a style separation value against all other types of content (e.g., TEXT, 
IMAGE, GRAPHIC, LIST, TABLE, etc.). As mentioned, the IMAGE vs. TEXT types would have one value for their 
degree of style separation. The IMAGE vs. GRAPHIC types would have a value for their respective degree of style 

10 separation. All types would be stored in a manner, which renders the value for the degree of style separation between 
two content types readily retrievable. 

[0179] Once the style separation table has been generated, the value for the separation of style between content 
group object G and identified neighbor N is readily retrieved from the table of separation values by a function, referred 
to herein as type (), which returns a number for content type. The pre-determined value for the separation between 
15 two content types would be retrieved from the table of style separation values by the function's returning a value for 
type(G) and type(N). In this instance, StyleSep = TypeSepTable[type(G)][type(N)]. 

[0180] When the two objects are both the same type, then one can compare the style values of one object to the 
corresponding style value of the other. For each style value pair one can calculate a style difference. For numeric 
parameters such as font size, line spacing, the style difference can be calculated as just the absolute difference of the 

20 size values. For multidimensional values such as color, the style difference can be the distance between the values. 
For enumerated values such as quadding, font family or font style one can use a two-dimensional look-up table indexed 
by the enumerated values for the two objects to retrieve difference. An overall style separation difference becomes the 
weighted sum of the various style differences available for the object type. For example: StyleSep = EWj dj(G, N); where 
the sum is over available style parameters i, and Wj is the weight of the i th style parameter, and dj is the difference 

25 measure for the i th style parameter. 

[0181] The particular methods for evaluating style separation herein are exemplary and are not to be considered as 
limiting in scope. Other methods for determining style separation should be considered within the scope of the present 
invention, for example, a function of measured human responses to differing styles; such that the present invention is 
directed to not only in the particular method of determining style separation, but also in the much broader concept of 

30 using a measure of style separation in a determination of content separability in the context of evaluating document 
ease of use and document quality. 

[0182] As illustrated in Figure 40, another parameter or factor used in determining ease of use is the measurement 
and quantization of the document's background separation. 

[0183] In a preferred embodiment of the present invention, objects on different color backgrounds can be considered 
35 separate and distinct. Thus, background separation can be thought of as the difference in backgrounds 1102 of two 
objects (110 and 1101). If, for instance, background color 1102 is a style parameter of the object G (1101) or one of 
its ancestors, the content tree is searched upward until the first object with a specified background is found. The fol- 
lowing pseudocode illustrates this. 

40 

FindBackground(G) 

{ 

45 



50 
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if G specifies a background color 
then return that color 
otherwise 

if G is the root of the content tree 
then return the default background color (e.g. white) 
otherwise 

return FindBackground( parent(G) ) 

} 

If, on the other hand, backgrounds are content objects, such as rectangles that are members of the same group (or 
perhaps a parent group) as the object in question, another search has to be done. The pseudocode is as follows: 

FindBackground(G, C) 

{ 

if C is the root of the content tree 

then return the default background color (e.g. white) 
P = parent(C) 
for each child K of P 

if K is different from C and K is a rectangle and K encloses G 
then return the color of K 
return FindBackground(G, P) 

} 

The test for K enclosing G can be performed, for example, using the bounding box for K and G to ensure that the 
comers of the bounding box of G are within the comers of the K box. 

[0184] Once the backgrounds for two objects have been determined, a difference measure can be derived. Differ- 
ences in color can be determined using the distance in a color space that strives for visual uniformity such as L*a*b* 
coordinates. Other color spaces can be used as well. 

[0185] The measure of background separation should not be just distance between colors in color space because 
once the colors are sufficiently different to easily tell apart, further differences between them does nothing to increase 
separability. What is preferred is a function of distance that is 1 for all values of color difference except those close to 
zero. One way to obtain this is by scaling color difference Dc by a large factor and then clamping the results to 1 
[0186] For example: BackgroundSep = min(s*Dc,1). An alternative is to take the n th root of the difference value to 
limit the color difference Dc to the range 0 to 1 . For example: BackgroundSep = Dc 1/r . Here, the larger the value of r 
is, the more closely the colors have to match before they fail to provide background separation. 
[0187] The particular methods for evaluating background separation herein are exemplary and are not to be consid- 
ered as limiting in scope. Other methods for determining background separation should be considered within the scope 
of the present invention, for example, a function of measured human responses to differing backgrounds; such that 
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the present invention is directed to not only in the particular method of computing background separation, but also in 
the much broader concept of using a measure of background separation in a determination of content separability in 
the context of evaluating document ease of use and document quality. 

[0188] As illustrated in Figure 41 , another parameter or factor used in determining ease of use is the measurement 
and quantization of the document's inherent separation. 

[0189] In a preferred embodiment of the present invention, often features are constructed into content objects. Such 
features are considered inherent to the object itself. An example is an object's border 1103 or an indented first line or 
other feature that inherently indicates a separation from other objects. Spacing before the paragraph or after the par- 
agraph that is different from the internal line spacing, can also signal a separation. Further, some separators only serve 
to distinguish on a single boundary, i.e., indicating separation at the top but not at the sides. 

[0190] As such, to calculate inherent separation, each of the four sides of the object under scrutiny needs to be 
considered separately. For instance, suppose w f is a weight that describes the relative importance of the i th feature to 
the top boundary fTopj(G). These weights should sum to 1. And, suppose a parameter P determines how strongly a 
successful separation feature overwhelms other features, and there is a constant c that should be close to 1 but may 
be slightly larger to avoid division by 0. Then, inherent separation can be defined by: InherentSepTop = c - [Zw*(c- 
fTopj(G))-P]' 1/ P. Similar expressions define the inherent separation for InherentSepBottom, InherentSepLeft, and Inher- 
entSepRight. 

[0191] One of these InherentSep values may be more appropriate for neighbor N depending upon whether N is 
mostly above, below, left, or right of object G. For example, given: 

ql = w G * (y N - y G ) + h G * (x N - x G ), and 
q2 = w G * (y N - y G ) - h G * (x N - x G ) then: 
if ql > 0 then if q2 > 0 then 

use InherentSepTop 
otherwise 

use InherentSepRight 
otherwise if q2 > 0 then 

use InherentSepLeft 



otherwise 

use InherentSepBottom. 



[0192] Note that neighbor N will also have an inherent separation. Thus, the complementary inherent separations 
from both object G and neighbor N can be combined as well. For example, if neighbor N is substantially above object 
G, then use the sum of InherentSepTop of G and InherentSepBottom of N. Alternatively, the maximum of the comple- 
mentary inherent separations from G and N can be used. The InherentSep from a neighbor is one of the top, bottom, 
left or right Inherent Separations as chosen above. 

[0193] The particular methods for evaluating inherent separation herein are exemplary and are not to be considered 
as limiting in scope. Other methods for determining inherent separation should be considered within the scope of the 
present invention, for example, a function of measured human responses to differing inherent separation features; 
such that the present invention is directed to not only in the particular method of computing inherent separation, but 
also in the much broader concept of using a measure of inherent separation in a determination of content separability 
in the context of evaluating document ease of use and document quality. 

[0194] As illustrated in Figure 36, another parameter or factor used in determining ease of use is the measurement 
and quantization of the document's effective separation. 

[0195] In a preferred embodiment of the present invention, contributions to the measure of separability can be com- 
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bined to form the content object's degree of Effective Separation (EffectiveSep) from a particular neighbor is given by: 
EffectiveSep = c - [w x *(c - SpatialSep) -p + w a *(c - AlignmentSep) -p + w s * (c - StyleSep) -p + w b * (c - BackgroundSep) -p 
+ w n * (c - InherentSep) -p]" 1/ p where terms w x , w a , w s , w b and w n are weights that sum to 1 . While other methods of 
combining the individual separation measures are possible, this has the property that if any of the separation values 
between object G and neighbor N is close to 1 , the Effective Separation will also be close to 1 . 
[0196] The particular method for evaluating effective separation herein is exemplary and not to be considered as 
limiting in scope. Other methods for determining effective separation should be considered within the scope of the 
present invention, for example, a function of measured human responses to differing separation devices; such that the 
present invention is directed to not only in the particular method of determining effective separation, but also in the 
much broader concept of using a measure of effective separation of content in a determination of content separability 
in the content of evaluating document ease of use and document quality. 

[01 97] In a preferred embodiment of the present invention, to obtain an overall measure of total separation, an object's 
total separation from all neighbors, a determination of the minimum of the effective separations between object G, and 
all its neighbors has to be made. 

[0198] In this embodiment, this means combining separation values for each neighbor. Total separation can be given 
by: TotalSep = min^EffectiveSep,); where EffectiveSep, is the EffectiveSep value for the i th neighbor, and the minimum 
is taken over all neighbors. Alternatives with average separations are also envisioned. An averaging method that gives 
the greatest weight to the closest distance can be defined by the reciprocal root of the sum of reciprocal powers. For 
example: TotalSep = [(1/n) * E (c + EffectiveSepi) - P ] _1/ p - c. Here, n is the number of neighbors, c is a small constant 
to guard against division by zero, and the power p determines how strongly small separations dominate. If an object 
has no neighbors then its TotalSep value should be 1. 

[0199] The particular methods for evaluating total separation as provided herein are exemplary and are not to be 
considered as limiting in scope. Other methods for determining total separation should be considered within the scope 
of the present invention; such that the present invention is directed to not only in the method of determining total content 
separation but in the much broader concept of using a measure of total separation of document content in the evaluation 
of a document's quality. 

[0200] An overall separability measure for a document is determined by combining total separations for all document 
content objects and groups. This can be by a straight average. Although, any object or group with a low separability 
value may adversely impact the value for the entire document, and therefore, should be given a higher weight by 
combining as the root of powers. 

[0201] The particular methods for evaluating overall separability as provided herein are exemplary and are not to be 
considered as limiting in scope. Other methods for determining overall separability should be considered within the 
scope of the present invention; such that the present invention is directed to not only in the method of determining 
overall separability but also in the much broader concept of using a measure of overall separability of document content 
in the evaluation of a document's ease of use and a document's quality. 

[0202] Separability may vary with level in the content tree hierarchy in which an object exists. An algorithm for com- 
puting separability by recursively traversing the content tree is provided herein which calculates a weighted average 
using weights w L which vary with content's tree level L. The following pseudocode is provided by way of example. 

Separability(G) 

{ 

if G is a leaf node 

then return TotalSep(G) 
otherwise 

for each child C of G 

call Separability(C) and find the average of these values A 
return w L * TotalSep(G) + (1 - w L ) * A 

} 
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The above Separability() routine should start at the root node of the content tree. 

[0203] The particular methods for evaluating a document's overall degree of separability are exemplary and are not 
to be considered as limiting in scope. Other methods for determining separability should be considered within the scope 
of the present invention, for example, a function of measured human responses to differing separation techniques; 
such that the present invention is directed to the much broader concept of determining separability for a document 
based on a combination of individual content separability measured in the context of evaluating document ease of use 
and document quality. 

[0204] As illustrated in Figures 43 to 46, another parameter or factor used in determining ease of use is the meas- 
urement and quantization of the document's distinguishability. 

[0205] In a preferred embodiment of the present invention, given two identical paragraphs, located at the top of two 
separate pages of a multi-page document, and that these paragraphs are the only content on their respective pages, 
the degree of separability of these object paragraphs can be based on a determination as to where one object ends 
and another object begins. In this instance, the separability value would be high since these objects have no neighboring 
objects on the same page. In other words, the closer objects are to one another, the easier it is to note their differences. 
[0206] On the other hand, a measure of distinguishability of these two would be low because absent neighboring 
objects, providing a frame of reference, few clues are provided as to which of the two paragraphs are actually being 
looked at. 

[0207] A heading can distinguish the content that follows, as illustrated in Figure 44. The heading can be a separate 
paragraph at the start of a group of content objects (usually with a different style to distinguish it as a heading). Num- 
bering of list elements and, to a lesser degree, bullet elements also help distinguish content. There can be a hierarchy 
of headings, e.g., chapter, section, list element, etc. Each heading contributes to making an underlying object distin- 
guishable from neighboring objects. 

[0208] In general, the lower the heading is in the content tree, the smaller the set of content it applies to. Thus, the 
more specific the identification is. The lower level headings in the content tree, and physically closer headings, count 
more than higher level ones. 

[0209] The following recursive algorithm determines heading contribution to distinguishability of object G. It assumes 
that heading content objects have already been identified. A heading's contribution is weighed according to its distance 
up the tree from the original object. 
[0210] HeadingDistinguish(G) 
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{ 

if G is the root 
then return 0 
P = parent(G) 
if P is a list 

then if P numbers the list elements 
then R = ListNumberWeight 
otherwise if P is a bulleted list 

then R = ListBulletWeight 
otherwise R = 0 
if a child of P is a heading 

then R = minimum (R + HeadingWeight, 1) 
return w * R + (1- w) * HeadingDistinguish(P) 

} 

The expressions: ListNumberWeight, ListBulletWeight and HeadingWeight are constant contributions to the heading 
result. These have values between 0 and 1 . The ListNumberWeight should have the largest value since list numbers 
are distinct and near to their corresponding list element content objects. Whereas, HeadingWeight and ListBulletWeight 
have lesser values, since the heading applies to all list elements. Bulleted elements have identical values for all ele- 
ments in the list. The ListBulletWeight may be larger than the HeadingWeight since there will be a bullet close to the 
object. The weight w specifies the relative importance of the heading at the current tree level to headings at higher 
levels. For example, if w=0.5, then a heading at the current level would be considered as important as headings at all 
higher levels combined. 

[0211] Object G and neighbor N should be distinguishable based on content type and value, as illustrated in Figure 

45. For different types of content (1106, 1107, and 1108), their value differences can be retrieved from a two-dimen- 
sional table indexed by content type. The table preferably contains values that express just how different those content 
types are. If type(G) does not match type(N), ContentDistinguish= TypeDistinguishTable[type(G)][type(N)]. 

[0212] If the types do match, content properties can be compared. For groups, lists and tables, the total number of 
words or characters for all of their contained elements can be compared. 

[0213] For example, for paragraphs, the number of words or characters thereof can be counted. For lists, the number 
of list elements can be compared. For tables, the number of rows and columns can be compared. For graphic objects, 
size and shape can be compared. Since some object types may have several properties by which differences are 
measured, an overall difference is preferably calculated as a weighted sum of the various content differences for an 
object type. For example, ContentDistinguish = EWj cdj(G,N), where the sum is over available style parameters i, W| is 
the weight for the i th content difference measure, and cdj is the actual i th difference measure. 

[0214] Furthermore, objects can be distinguished by their position on their respective pages, as illustrated in Figure 

46. Given object G and neighbor N, the center position for these objects 110 on page 100 (x G , y G ) and (x N , y N ), the 
distance between them can be calculated preferably normalized by the dimensions of the page W p by H p . 

[0215] For example: PositionDistinguish = (((x G - x N ) 2 + (y G - y N ) 2 ) / (W P 2 + H P 2 )) 1/2 . This can be further limited by 
only considering nearby neighbors on the same page. The same list of neighbors generated for separability can then 
be utilized. The cost in limiting comparisons to objects on a page, however, is the failure to recognize cases where 
objects on different pages are indistinguishable. 

[0216] If any of AlignmentSep, StyleSep, BackgroundSep and ContentDistinguish measures, (described above), 
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provides a strong difference, then the overall effective distinguishability should be high. The closer the neighbor is to 
the object, the easier it should be to observe their differences. The end result should receive a boost from the SpatialSep. 
The value of PositionDistinguish can be a further differentiator. If boost b is defined by: b = d / (d + SpatialSep); where 
the d parameter controls the strength of the boost effect of spatial nearness, then: EffectiveDistinguish = c - [w a * (c - 
b*AlignmentSep) - p + w s * (c - b*StyleSep) -P + w b * (c - b*BackgroundSep) -P + w c * (c - b*ContentDistinguish) -P + w p 
* (c - PositionDistinguish) -p]- 1/ p); where w a , w s , w b , w c and w p are weighting values that give the relative importance 
of the alignment, style, background, content and position differences respectively and should sum to 1 . The constant 
c is slightly larger than 1 to prevent division by zero. Note that this is the effective distingishability between an object 
and one of its neighbors. 

[0217] To quantify the total distinguishability of a content object, it must be distinguished from all neighbors. In ad- 
dition, any inherent features such as headers must also be considered. Total distinguishability can be determined by 
taking the minimum of all EffectiveDistinguish values for all neighbors. 

[021 8] Alternatively, one can raise each term to a power and then apply the inverse power to the sum. TotalDistinguish 
= w h * HeadingDistinguish + (1- w h ) * ([(1/n) * £ (c + EffectiveDistinguish)~P]~ 1/ P - c); where w h is the weight of the 
HeadingDistinguish property relative to the neighbor differencing properties, n is the number of neighbors, constant c 
is a small constant to guard against division by zero, and power p determines how strongly close similarities dominate. 
[021 9] A combination of distinguishability measures, as illustrated in Figure 43, is useful in evaluating the document's 
total distinguishability. 

[0220] More specifically, the total distinguishability, as illustrated in Figure 43, is considered a combination of the 
effective distinguishability and the heading distinguishability. In Figure 43, the quantized distinguishability value is de- 
rived by a combining of the effective distinguishability and the heading distinguishability using a total distinguishability 
quantizer or combiner circuit 23. 

[0221] It is noted that the illustration shows a circuit for the total distinguishability quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0222] A document's overall distinguishability can be defined as the combining of all total distinguishability for all 
content objects and groups. These values can be combined using a straight average. Alternatives, however, are pos- 
sible. 

[0223] If any neighbors are present, from which it is difficult to distinguish the object, then the overall distinguishability 
for the document should be low. One might argue that any object or group with a low distinguishability value adversely 
impacts the entire document and therefore should be given higher weight by combining as the root of the sum of powers. 
[0224] Another issue is whether or not the importance of distinguishability varies with the level in content hierarchy. 
For example, should being able to distinguish chapters be more or less important than being able to distinguish para- 
graphs. An algorithm for computing document distinguishability by recursively traversing the content tree was previously 
discussed which calculates a weighted average. Again, weights w L can vary with tree level L. 

[0225] To determine the distinguishability of a document using its content tree can be effectuated by the following 
pseudocode called on the root node of the content tree. 

Distinguishability (G) 

{ 

if G is a leaf node 
then return TotalDistinguish(G) 
otherwise 

for each child C of G 

call Distinguishability (C) and find the average A of values 
return w L * TotalDistinguish (G) + (1 - w L ) * A 

} 
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[0226] The particular methods for evaluating a document's overall degree of distinguishability are exemplary and 
are not to be considered as limiting in scope. Other methods for determining distinguishability should be considered 
within the scope of the present invention, for example, a function of measured human responses to differing distin- 
guishing devices; such that the present invention is directed to the much broader concept of determining distinguish- 
5 ability for a document based on a combination of individual content distinguishability measured in the context of eval- 
uating document ease of use and document quality. 

[0227] As illustrated in Figures 47 to 51 , another parameter or factor used in determining ease of use is the meas- 
urement and quantization of the document's locatability. 

[0228] In a preferred embodiment of the present invention, the term locatability is used to mean the ability to find or 
10 locate a particular content item from among all the other content items. A measure of locatability is constructed by 
examination of the document factors that aid or inhibit the locating of content objects. 

[0229] As illustrated in Figure 50, another parameter or factor used in determining ease of use is the measurement 
and quantization of the document's visibility. 

[0230] In a preferred embodiment of the present invention, one factor in determining an object's locatability is to 
15 determine the visibility of the object, i.e., how well it can be seen against its background. As used herein, visibility 
means how easy it is to see the object, or how difficult it is to overlook it. Herein two characteristics are used in measuring 
the value of the object's visibility. One is the size of the object 1110 (the larger the object the easier it should be to 
detect and identify it) and the other 1111 is its difference from the background. 

[0231] As a measure of the difference from the background (1111), the luminance contrast is used, although other 
20 and more complex measures are envisioned. If the background is textured, the luminance contrast and color difference 
may not be well defined. Texture may also act to hide an object. 

[0232] If colors are specified in red, green and blue (R,G,B) coordinates normalized to range between 0 and 1 then 
luminance can be given by: Y = yr R + yg G + yb B; where yr, yg and yb are the luminance values for the red, green 
and blue primary colors respectively. The yr, yg and yb values depend upon the details of the color space actually used 

25 but typical values are 0.25, 0.68 and 0.07 respectively. 

[0233] Contrast is calculated from the luminance of the foreground Y f and that of the background Y b such that: 
Contrast = 2 | Y b - Y f | / (Y b + Y f ). It should be pointed out that since both contrast and size affect visibility, these values 
are combined by multiplying them together. While contrast ranges between 0 and 1, size can be unbounded. For a 
size to be bounded by 0 and 1 , the object size is normalized by dividing it by the maximum size it can be. For example: 

30 visibility = contrast * (object area) / (maximum area). In general, this is the area of the document. But, if objects are 
restricted to a page, the page size can be used. 

[0234] The particular methods for evaluating an object's degree of visibility are exemplary and are not to be consid- 
ered as limiting in scope. Other methods for determining visibility should be considered within the scope of the present 
invention, for example, a function of measured human responses to object characteristics with respect to its visibility. 
35 [0235] As illustrated in Figure 49, another parameter or factor used in determining ease of use is the measurement 
and quantization of the document's structural locatability. 

[0236] In a preferred embodiment of the present invention, another factor in the ease of locating a document element 
is the presence of structural aids (such as headings and bullets within the document). This measure is termed the 
structural locatability and can be implemented by a tree or table look-up where the result is a predefined value, which 
40 depends on the type and style of the structure that contains the element. For example, a decision tree that set a 
structural location term StructLocate for element E might look as follows: 

G = parent(E) 

45 

if G is a table 
then if G has row headings 

50 



55 
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then if G has column headings 
then StructLocate = Vtrc 
otherwise StructLocate = Vtr 
otherwise if G has column headings 
then StructLocate = Vtc 
otherwise StructLocate = Vt 
otherwise if G is a list 

then if G has bullets 

then StructLocate = Vlb 
otherwise if G has numbers 

then StructLocate = Vln 
otherwise StructLocate = VI 
otherwise StructLocate = Vg 

where Vtrc, Vtr, Vtc, Vt, Vlb, Vln, VI and Vg are the predetermined locatability contributions for structural cases. 
[0237] The particular methods for evaluating a document's structural locatability are exemplary and are not to be 
considered as limiting in scope. Other methods for determining structural locatability should be considered within the 
scope of the present invention, for example, a function of measured human responses to structural aids to locating 
objects; such that the present invention is directed to the much broader concept of determining structural locatability 
for a document based on a combination of individual content structure measured in the context of evaluating document 
ease of use and document quality. 

[0238] In addition to structural contributions, a member of a group may be identified by its effective distinguishability 
from other group members. For example, one might locate the long paragraph in a group and ignore the short ones, 
or locate the middle paragraph of a list. The methods of measuring effective distinguishability can also be used for 
locatability. However, instead of comparing the object to its neighbors, the object is compared to its sibling members 
in the group. 

[0239] Having calculated the EffectiveDistinguish value for the group element under consideration, with each of the 
other sibling members, the results can be combined as follows: DistinguishLocate = [(1/n)E(c + 
EffectiveDistinguish)"P ] _1/ p - c; where the sum is overall n sibling group members. The constants c and P have the 
same effect as for the TotalDistinguish calculation and may be the same values. 

[0240] The ease of locating a member item within a group depends upon the number of items the group contains. If 
there are only one or two items in the group then it will be easy to locate an item. But if there are a thousand items, 
the task of locating one in particular will be more difficult. This depends upon the presentation method. For instance, 
finding an item presented in a table of 100 elements is not as difficult as finding the item in a list of 100 elements. A 
factor for the effects of the size of the group containing element E is calculated as: 
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G = parent(E) 
ifGisatable 

then GroupSizeFactor = (l-A+A/rows(G))*(l-A+ A/columns(G)) 
otherwise GroupSizeFactor = (1 - A + A/element s(G)); 

10 

where rows(G) and columns(G) are the number of rows and columns in the table G and elements(G) is the number of 
elements in the group G and A is a parameter controlling the strength of the factor with group size. 
[0241] The structural contribution to locating a group member is combined with the distinguishability contribution. A 
weighted sum of the two contributions is used where the weights determine the relative importance of the two factors. 
15 However, it can be argued that if either contribution allows one to locate the element, then the overall result should be 
high, regardless of the other contribution. 

[0242] The combined result should reduce according to the size of the group. This can be achieved by: MemberLocate 
= (c - [w m * (c - Structl_ocate)-P + (1-w m ) * (c - DistinguishLocate) -p]- 1/ p) * GroupSizeFactor; where w m is the weight of 
the structural contribution relative to the distinguishability contribution, c is a constant slightly larger than 1 and P is an 
20 number greater than 1 . 

[0243] A combination of locatability measures, as illustrated in Figure 48, is useful in evaluating the document's 
member locatability. 

[0244] More specifically, the member locatability, as illustrated in Figure 48, is considered a combination of the struc- 
tural locatability, as described above, and/or the distinguished locatability, as described above. In Figure 48, the quan- 
25 tized member locatability value is derived by a combining of the the structural locatability and the distinguished locat- 
ability using a member locatability quantizer or combiner circuit 25. 

[0245] It is noted that the illustration shows a circuit for the member locatability quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 
30 [0246] A further combination of locatability measures, as illustrated in Figure 47, is useful in evaluating the document's 
direct locatability. 

[0247] More specifically, the direct locatability, as illustrated in Figure 47, is considered a combination of the member 
locatability, distinguishability, separability, and/or visibility. In Figure 47, the quantized direct locatability value is derived 
by a combining of the member locatability, distinguishability, separability, and/or visibility using a direct locatability 

35 quantizer or combiner circuit 24. 

[0248] It is noted that the illustration shows a circuit for the direct locatability quantization process, this process may 
also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0249] In a preferred embodiment of the present invention, another mechanism to aid in locating an element is a 

40 reference or link to that element, such as a page number in a table of contents, or as a hyperlink in an electronic 
document. For example, a paragraph might be found through the table of contents or by looking in the index for the 
location of a particular word. The ease of location may not vary linearly with the number of references. If the number 
of references to the element under consideration is Nr, then a function that increases non-linearly from 0 to 1 with 
increasing Nr can be written as: ReferenceLocate = 1 - (Nr + 1)" 1/p ; where P determines how strongly additional ref- 

45 erences contribute. 

[0250] The particular methods for evaluating a contribution of references to the ability to locate objects are exemplary 
and are not to be considered as limiting in scope. Other methods for determining the contribution from references 
should be considered within the scope of the present invention, for example, a function of measured human responses 
to differing degrees of referencing; such that the present invention is directed to the much broader concept of deter- 
50 mining the effect of referencing on the measures of locatability in the context of evaluating document ease of use and 
document quality. 

[0251] As illustrated in Figure 51 , another parameter or factor used in determining ease of use is the measurement 
and quantization of the document's total locatability. 

[0252] In a preferred embodiment of the present invention, the above individual locatability contributions can be 
55 combined into a total locatability measure. First, note that if any of the first four contributing measures are low for an 
item, then that particular item is likely to be hard to locate as it will either be hard to see or will be confused with its 
neighbors, or siblings. These four contributions can be combined as follows: DirectLocate = ([w v * (c + Visibility)-P + 
w s * (c + TotalSep)-P + w d * (c + TotalDistinguish) _ P + w d! * (c + DistinguishLocate)-P] -1/ P - c); where w v , w s , w d and w d! 
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are the weights describing the relative importance of the contributions and sum to 1 ; c is a small number used to prevent 
division by zero and P determines how strongly one bad contribution to locatability spoils the overall result. 
[0253] Next, the measures for locating the item directly, locating it through references, and locating it through its 
parent, can all be combined. Thus: TotalLocate = c - [w n * (c - Directl_ocate)-P + w r * (c - ReferenceLocate) _ P + w p *(c - 
Totall_ocate(parent))-P]- 1/ P; where the weights w n , w r and w p sum to 1 , c is a number slightly larger than 1 and P is a 
number greater than or equal to 1 . 

[0254] An overall locatability for a document is determined by combining the total locatability for all document content 
objects and groups. The simplest way to combine these values is a straight average. Just as for separability and 
distinguishability, one might argue that any object or group with a low locatability value strongly impacts the entire 
document and should be given higher weight such as by combining the root of powers. 

[0255] The documents overall locatability gives an overall feel for how easy it is to locate items in a document by 
calculating and combining measures of how easy it is to locate each and every document component. An algorithm 
for computing document locatability is provided herein which recursively traverses the content tree to calculate a weight- 
ed average; although the weights w L can vary with tree level L. To find the overall Locatability of a document, the 
following routine is executed on the root node of the content tree. 

Locatability (G) 
{ 

if G is a leaf node 

then return TotalLocate(G) 
otherwise 

for each child C of G 

call Locatability (C) and find the average of these values 

A 

return w L * TotalLocate (G) + (1 - w L ) * A 

} 

[0256] The particular methods for evaluating a document's overall degree of locatability are exemplary and are not 
to be considered as limiting in scope. Other methods for determining locatability should be considered within the scope 
of the present invention, for example, a function of measured human responses to differing techniques for locating 
content objects; such that the present invention is directed to the much broader concept of determining separability 
for a document based on a combination of individual content separability measured in the context of evaluating doc- 
ument ease of use and document quality. 

[0257] A combination of locatability measures, as illustrated in Figure 51 , is useful in evaluating the document's total 
locatability. 

[0258] More specifically, the total locatability, as illustrated in Figure 51, is considered a combination of the direct 
locatability, reference locatability, and/or parents' locatability. In Figure 51, the quantized total locatability value is de- 
rived by a combining of the direct locatability, reference locatability, and/or parents' locatability using a total locatability 
quantizer or combiner circuit 26. 

[0259] It is noted that the illustration shows a circuit for the total locatability quantization process, this process may 
also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0260] In a preferred embodiment of the present invention, a document's degree of searchability can be determined 
by first determining a value for strength of searchability of the document, and then determining the document's search 
density relative to the strength of searchability. The search density is mapped to a value that ranges between 0 and 1 
and in one embodiment consists of evaluating the relationship given by: 1 - c / (c + Search Density); where c is a 
constant which is the size of the typical search density and P determines how quickly searchability approaches 1 with 
increasing search density. 
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[0261] The strength of searchability is determined by features of the document intended to aid in searching. Features 
include at least one of the number of table elements, the number of list elements, the number of list bullets, and the 
number of list element numbers or the number of other reference terminals, a reference terminal being a position 
indicator that can be used by a reference; such as a label, a chapter number for a textual reference, or an anchor for 
a hyperlink. 

[0262] One method for collecting such features is to traverse the content tree looking for the features and increment- 
ing counters when they are discovered. 

[0263] An exemplary recursive algorithm to collect these features is as follows: 

CollectSearchFeatures(G) 

{ 

if G is a table 

then Ft = Ft + number of elements is G 
for each element E of G 

CollectSearchFeatures(E) 

otherwise 
if G is a list 

then Fl = Fl + number of elements in G 
if G is bulleted 

then Fb = Fb + number of bullets in the list G 
if G is numbered 

then Fn = Fn + number of numbered elements in G 
for each element E of G CollectSearchFeatures(E) 
otherwise 
if G is a group 

then for each element E of G CollectSearchFeatures(E) 
otherwise 



if G acts as a reference label 

then Fr = Fr + 1 
if G is an anchor 

then Fa = Fa + 1 
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[0264] An overall strength of searchability can be formed as the weighted sum of the various feature contributors. 
For example: SearchStrength = w t * Ft + w, *FI + w b * Fb + w n * Fn + w r * Fr + w a * Fa; where w t , w h w b , w n , w r and w a 
are the weights and sum to 1 . 

[0265] The size of the document may also influence searchability. Having n features in a small document should 
5 count more than n features in a large one. Thus, document size can be defined as the amount of information it contains. 
Document information can be approximated by the number of characters in the document description. For example: 
SearchDensity = SearchStrength / NumberOfCharacters. 

[0266] This provides a measure of the document's search enabling characteristics, but it is potentially unbounded. 
It can be converted to a measure that varies between Oand 1. For example: Searchability = 1 - c/ (c + SearchDensity) p ; 

10 where c and P determine how quickly the Searchability approaches 1 with increasing SearchDensity. 

[0267] The particular methods for evaluating a document's overall degree of searchability are exemplary and are not 
to be considered as limiting in scope. Other methods for determining searchability should be considered within the 
scope of the present invention, for example, a function of measured human responses to differing search affecting 
features; such that the present invention is directed to the much broader concept of determining searchability for a 

15 document based on a combination of individual content search supporting features in the context of evaluating docu- 
ment ease of use and document quality. 

[0268] As illustrated in Figures 52 to 57, another parameter or factor used in determining ease of use is the meas- 
urement and quantization of the document's group identity. 

[0269] In a preferred embodiment of the present invention, group identity is the ability to see the members of a group 
20 as a group. One indicator of group identity is referred to herein as Spatial Coherence meaning that members of a group 
are all located close together on the page. Other indicators include the presence of a common background or sur- 
rounding border, a uniform style among the elements, alignment of the elements, organization of the elements into a 
list or a table, and the presence of a heading for the group. How to measure and combine these indicators is now 
discussed. 

25 [0270] As illustrated in Figures 53 and 54, another parameter or factor used in determining ease of use is the meas- 
urement and quantization of the document's spatial coherence. 

[0271] In a preferred embodiment of the present invention, spatial coherence is calculated when all the group ele- 
ments (110 of Figure 55) lie on the same page (100 of Figure 55). Here, it is assumed that the bounding box (1120 of 
Figure 55) for a group or a group element can be found. The bounding box 1120 gives the width and height of a minimal 
30 vertically aligned rectangle that encloses the item. For this determination, area is the width times the height: A(E) = W 
(E)*H(E). Spatial coherence of group G then becomes: SpatialCoherence = (Z A(E,) )/ A(G); where the sum is over the 
Ej elements of group G. 

[0272] Alternatively one might, for example, take the square root of the above expression making it more like a 
comparison of perimeters than areas. Or one could actually compute the perimeter of the convex hull of the group 

35 objects and divide it into the circumference of a circle with area matching the total area of the elements. 

[0273] When group elements are spread over two or more pages, one can determine the spatial coherence for each 
page and then combine the results. A weighted average can be used where the weight for a page is proportional to 
the number of elements on that page. One should also include a penalty for separating the group over pages. For 
example, one could divide by the number of pages involved. 

40 [0274] Figure 53 is an example of low spatial coherence. Figure 54 is an example of high spatial coherence. 

[0275] The particular methods for evaluating spatial coherence provided herein are exemplary and are not to be 
considered as limiting in scope. Other methods for determining spatial coherence should be considered within the 
scope of the present invention, for example, a function of measured human responses to differing spatial placements 
of content objects; such that the present invention is directed to not only in the particular method of determining spatial 

45 coherence but also in the much broader concept of using a measure of spatial coherence of content objects in a 
determination of content group identity in the context of determining document ease of use and document quality. 
[0276] As illustrated in Figures 56 and 57, another parameter or factor used in determining ease of use is the meas- 
urement and quantization of the document's consistency of style. 

[0277] In a preferred embodiment of the present invention, another indicator that elements belong to a group is that 
50 they all have the same style. One measure of consistency of style for a group would be to define the sameness of style 
as: 1 - StyleSep; where StyleSep measures the difference in style, and then to pair-wise compare all of the group 
elements and combine their sameness values. Combining can be done by averaging. 

[0278] The particular methods for evaluating sameness of style provided herein are exemplary and are not to be 
considered as limiting in scope. Other methods for determining sameness of style should be considered within the 
55 scope of the present invention; such that the present invention is directed to not only in the particular method of de- 
termining sameness of style but also in the much broader concept of using sameness of style in a determination of 
content group identity in the context of determining document ease of use and document quality. 
[0279] One method that looks deeper than just the first level of the group, and compares styles, is to recursively 
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move down the content tree and compare the leaves for consistency of style. The style of the leaves discovered can 
be compared to the style of the first leaf in the tree. Since one is looking for style features that tie all members of the 
group together, a simple check is to compare style properties to the first leaf. If any leaf has a different property value, 
then that property cannot be used as an indicator of group membership. 

[0280] The number of style properties that are consistent across all members are counted and that value becomes 
a measure of style consistency. A procedure to get the first leaf looks as follows: 

GetFirstLeaf(G) 

{ 

if G is a leaf 
then return G 

otherwise return GetFirstLeaf(FirstElement(G)) 



} 

An exemplary procedure to traverse the tree and compare style 
properties and return the overall consistency would be as follows: 
LeafConsistency(G, StyleProperties, CurrentConsistency) 
{ 

if G is a leaf then 

CurrentConsistency = CheckConsistency(G, StyleProperties, 
CurrentConsistency) 
otherwise 

for each element E of G 

CurrentConsistency = LeafConsistency(E, 
StyleProperties, CurrentConsistency) 
return CurrentConsistency 
} 

where StyleProperties is an array containing the style property values for the first leaf and CurrentConsistency is an 
array indicating for each style property whether all leaves checked thus far have a uniform value. 
[0281] The actual checking of style properties against those of the first leaf might be done as 
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CheckConsistency(G, StyleProperties, CurrentConsistency) 

{ 

for each style property i 

if StyleValue(G, i) does not match StyleProperties[i] 

then CurrentConsistency[i] = 0 
return CurrentConsistency 

The procedure for checking consistency of style would look as follows: 

StyleConsistency(G) 
{ 

E = GetFirstLeaf(G) 
for each style property i 

{ 

StyleProperties[i] = StyleValue(E, i) 
CurrentConsistency[i] = 1 

} 

LeafConsistency(G, StyleProperties, CurrentConsistency) 
return the sum of the CurrentConsistency array value divided by the 
array size. 

> 

[0282] Even more sophisticated calculations can be done. Figure 56 is an example of poor consistency of style. 
Figure 57 is an example of good consistency of style. 

[0283] The particular methods for evaluating consistency provided herein are exemplary and are not to be considered 
as limiting in scope. Other methods for determining consistency should be considered within the scope of the present 
invention, for example, a function of measured human responses to the consistency of styles for content objects; such 
that the present invention is directed to not only in the particular method of determining consistency but also in the 
much broader concept of using a measure of consistency in a determination of content group identity in the context of 
determining document ease of use and document quality. 

[0284] It can be argued that the further down the tree one must search for a leaf node, the less that node reflects 
the properties of the actual group being analyzed. One might, therefore, wish only to search the tree to a fixed depth 
for leaf nodes. Non-leaf nodes can also be compared to one another for consistency of their properties. Further, for 
the non-leaf nodes, one might just compare tables to tables, lists to lists and so on. But this raises the question of just 
what is the proper depth to use in the search. One way is to calculate consistency for all depths and combine the 
results, weighing the shallow depths higher than the large ones. 
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[0285] In a preferred embodiment of the present invention, measures for the contributions to group identity from 
structure, headings, borders and backgrounds can also be calculated. Assume a means of determining whether a 
group object has a background (or border); whether it has a heading element; and whether it is a list or table, a heading 
indicator can be created based on whether the group contains a heading. The following pseudocode illustrates this: 

if first element of the group is a heading 
then HasHeading = 1 
otherwise HasHeading = 0 

Similarly, explicit background elements and/or borders can be examined, as in the following pseudocode: 

if the group has its own background 

then HasBackground = 1 

otherwise HasBackground = 0 
if the group has a border 

then HasBorder = 1 

otherwise HasBorder = 0 

[0286] A table lookup can be used to obtain a structural contribution based on the type of group. Lists and tables 
should be more easily recognized as coherent objects than unstructured groups as given by: Structuralldentity = Struc- 
tldentTable[type(G)]. 

[0287] These indicators of group identity can be combined into an overall identity measure given by a weighted 
average, but a preferred embodiment is to do the root of a weighted average of powers as in: 

Groupldentity = c - [w sp * (c - SpatialCoherence)" p 
+ w ah * (c - alignH) " p + w av * (c - alignV) " p 
+ w st * (c - StyleConsistency) " p 

+ w h * (c - HasHeading) " p 
+ w bk * (c - HasBackground) " p 
+ w bd * (c - HasBorder) " p 
+ w si * (c - Structuralldentity) " p ]" 1/p ; 

where w sp , w sth w ah , w av , w h , w bk , w bd and w si are the weights and sum to 1 . The parameter c and P control the degree 
to which a single good value dominates. Constant c is slightly larger than 1 and the power P is typically 1 or larger. 
Indicators can be combined using a power function that favors high values. 

[0288] Just as for separability and distinguishability, any object or group with a low group identity value may strongly 
impact the entire document and preferably given a higher weight such as, for instance, by combining as the root of 
powers. A pseudocode algorithm for computing document group identity by recursively traversing the content tree is 
provided. This calculates a simple weighted average. Weights wL can vary with tree level L. To find the Document- 
Groupldentity call this routine on the root node of the content tree. 
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DocumentGroupIdentity (G) 

{ 

if G is a leaf node 

then return 1 
otherwise 

for each child C of G call DocumentGroupIdentity (C) and find 
the average of these values A 

return wg * Groupldentity (G) + (1 - wg) * A 

} 

[0289] The particular methods for evaluating group identity provided herein are exemplary and are not to be consid- 
ered as limiting in scope. Other methods for determining group identity should be considered within the scope of the 
present invention, for example, a function of measured human responses to differing document characteristics with 
respect to group identity; such that the present invention is directed to not only in the particular method of determining 
group identity but also in the much broader concept of using a measure of individual group identity in a determination 
of document's overall group identity. 

[0290] A combination of measures, as illustrated in Figure 52, is useful in evaluating the document's group identity. 
[0291] More specifically, the group identity, as illustrated in Figure 52, is considered a combination of the spatial 
coherence, consistency of style, structural identity, horizontal alignment, vertical alignment, heading, background, and/ 
or border. In Figure 52, the quantized group identity value is derived by a combining of the spatial coherence, consist- 
ency of style, structural identity, horizontal alignment, vertical alignment, heading, background, and/or border using a 
group identity quantizer or combiner circuit 27. 

[0292] It is noted that the illustration shows a circuit for the group identity quantization process, this process may 
also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0293] As discussed above, the content group ease-of-use is calculated as a combination of the measures of con- 
tributing factors. The factors can include separability, distinguishability, locatability, searchability, and/or group identity. 
These factors can be calculated using relations of the group elements with one another and with relations of group 
elements with non-group neighbors. 

[0294] These relations can include, for example, spatial coherence, spatial separation, alignment separation, heading 
separation, background separation, and/or style separation. If each factor is defined to produce a value ranging between 
0 and 1 , such that 0 means low or bad ease-of-use contribution to a quality value, and 1 meaning high or good ease- 
of-use contribution to a quality value, these (and possibly other such rules) can be calculated and combined to form a 
measure for the overall contribution to ease-of-use from the treatment for content groups. If Vj is the value calculated 
for the i th rule, then the group ease-of-use measure V EU is formed as a function E of these contributions: V EU = E(V 15 

v 2 ....v N ) 

[0295] The combining function E can be as simple as a weighted average of the contributions, but because any bad 
contributor can ruin the ease of use no matter how good the others are, a linear combination is not preferred. An 
alternative is to use: V EU = (Ew, (d + Vj)-P)" 1/ P - d 

[0296] The w ( factors are the weights that specify the relative importance of each rule; they should sum to 1. The 
exponent p introduces the nonlinearity that can make one bad value overwhelm many good ones. The larger p is, the 
greater this effect. 

[0297] Other combining functions are possible, for example, one could take the product of the contributions. If weight- 
ing of the contribution is desired, this can be done by exponentiation. V EU = nv, wi ' 

[0298] The particular methods for evaluating content group ease-of-use provided herein are exemplary and are not 
to be considered as limiting in scope. Other methods for determining group ease-of-use should be considered within 
the scope of the present invention, for example, a function of measured human responses to differing document char- 
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acteristics with respect to group ease-of-use; such that the present invention is directed to not only in the particular 
method of determining ease-of-use but also in the much broader concept of using a combination of individual group 
property measures in the context of evaluating document ease of use and document quality. 

[0299] A combination of ease of use measures, as illustrated in Figure 35, is useful in evaluating the document's 
5 ease of use. 

[0300] More specifically, the group ease of use, as illustrated in Figure 35, is considered a combination of separability, 
distinguishability, locatability, searchability, and/or group identity. In Figure 35, the quantized group ease of use value 
is derived by a combining of the separability, distinguishability, locatability, searchability, and/or group identity using an 
ease of use quantizer or combiner circuit 20. 
w [0301] It is noted that the illustration shows a circuit for the ease of use quantization process, this process may also 
be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 

EYE-CATCHING ABILITY 

15 

[0302] For some documents, such as advertisements and warning labels, it is important that the documents catch 
the viewer's eye and attention. An important property contributing to the quality of these documents is therefore the 
eye-catching ability of a given layout. The present invention provides a method of calculating such an eye-catching 
measure. 

20 [0303] Eye-catching ability is calculated as a combination of simpler properties. If any of the simpler eye-catching 
properties is strongly present, then the overall effect is an eye-catching document. Contributing factors can include 
colorfulness, color dissonance, font size, information lightness, picture fraction, and/or novelty. Each factor is defined 
such as to produce a value ranging between 0 and 1 such that 0 means low or bad eye-catching value and 1 means 
high or good eye-catching value. These (and possibly other such rules), can be calculated and combined to form an 

25 overall eye-catching measure. If V| is the value calculated for the i th rule, then the eye-catching measure V EC is formed 
as a function E of these contributions: V EC = E(V d , V d , V f , V M , V p ... V n ) 

[0304] The combining function E can be as simple as a weighted average of the contributions, but because any good 
contributor can lead to an eye-catching document, no matter how bad the others are, a linear combination is not pre- 
ferred. An alternative is to use: V EC = d - [E w, (d - Vj)-P]- 1/ P 

30 [0305] Here d is a number slightly larger than 1 . The closer the value of d to 1 , the more strongly a good value will 
compensate for all other values. The Wj factors are the weights that specify the relative importance of each rule; they 
should sum to 1. The exponent p introduces the nonlinearity that can also increase the strength by which one good 
value can overwhelm many bad ones. The larger p is the greater this effect. Note that this formula for combining the 
contributing factors differs from the preferred method for combining aesthetics factors or ease-of-use factors. In the 

35 cases of aesthetics and/or ease-of-use, any bad factor would spoil the quality. Thus, when combining, any low contri- 
bution will lead to a low result. For eye-catching however, any good factor will rescue the others, and when combining, 
any high contribution will lead to a high result. 

[0306] Other combining functions are possible; for example, one could take the inverse of the product of the inverse 
contributions. If weighting of the contribution is desired, this can be done by exponentiation. V EC =1 -n(1 -i) wi ' 

40 [0307] The particular methods for evaluating ability of the document to catch the eye provided herein are exemplary 
and are not to be considered as limiting in scope. Other methods for determining eye-catching ability should be con- 
sidered within the scope of the present invention, for example, a function of measured human responses to differing 
document characteristics with respect to the ability to catch the eye; such that the present invention is directed to not 
only in the particular method of determining eye-catching ability, but also in the much broader concept of using a 

45 combination of individual measures in the context of evaluating document eye-catching ability and document quality. 
[0308] A combination of measures, as illustrated in Figure 58, is useful in evaluating the document's eye-catching 
ability. 

[0309] More specifically, the eye-catching ability, as illustrated in Figure 58, is considered a combination of colorful- 
ness, color dissonance, font size, information lightness, picture fraction, and/or novelty. In Figure 58, the quantized 

50 eye-catching ability value is derived by a combining of the colorfulness, color dissonance, font size, information light- 
ness, picture fraction, and/or novelty using an eye-catching ability quantizer or combiner circuit 30. 
[0310] It is noted that the illustration shows a circuit for the eye-catching ability quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 

55 [0311] In a preferred embodiment of the present invention, color is eye-catching and a bright orange page can capture 
attention better than a gray one. The primary property of color of interest here is saturation (or chrominance). There 
are several possible ways to calculate an approximate saturation value that can be used in determining the overall 
colorfulness of a document or a page. Perhaps the simplest calculation for colors expressed in an RGB color space 
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is: c = max(R, G, B) - min(R, G, B) where c is the saturation, as illustrated by Figure 59 (or chrominance) and max and 
min are the maximum and minimum functions respectively. 

[0312] An alternative calculation is: c = [E 2 + S 2 ] 1/2 where E = R - G and S = (R+G)/2 - B 

[0313] When colors are expressed in the L*a*b* color space, the chrominance can be calculated as: c = [(a*) 2 + 

(b*)2]1/2 

[0314] The color saturation values are weighted by the area of the colored objects. This is then divided by the total 
document area to yield a colorfulness measure: V d = £ q Aj / Ad where V d is the colorfulness measure, q is the 
saturation value for the i th object and A| is that object's area. Ad is the area of the entire document. The sum is over 
all objects visible in the document. 

[0315] The particular methods for evaluating content colorfulness provided herein are exemplary and are not to be 
considered as limiting in scope. Other methods for determining colorfulness should be considered within the scope of 
the present invention, for example, a function of measured human responses to differing amounts and types of color; 
such that the present invention is directed to not only in the particular method of determining colorfulness but also in 
the concept of using colorfulness measures in the context of evaluating document eye-catching ability and document 
quality. 

[0316] In a preferred embodiment of the present invention, when multiple colors are present on a page, it is not only 
the amount of color saturation present that is important, but also how harmonious those colors are. For example, pink 
and green go together much more harmoniously than pink and orange. Colors that clash will catch the eye. A contributor 
to the eye-catching property is therefore the color dissonance. 

[0317] In the following discussion, the calculation of color dissonance is described for the objects that can be seen 
together (i.e. the objects on a page). If the document has multiple pages, then an average color dissonance value for 
all pages can be determined. 

[031 8] The color dissonance (or harmony) between two colors is largely determined by their hue difference (although 
the colors should have sufficient saturation and area to be noteworthy). 

[0319] There are several methods known in the art for calculating an approximate hue value as an angle for the 
chrominance components. For example, using the E and S values described above one can define the hue as: h = 
arctan(S/E) 

[0320] As is well known in the art, special handling of the case E = 0 is needed and checking signs to determine the 
quadrant should be done in order to avoid the confusion between E/S and (-E)/(-S). The result can also be divided by 
2n to yield a value between 0 and 1 . 

[0321] In the L*a*b* color space a similar calculation can be performed giving h = arctan((b*)/(a*)) 
[0322] Another method described by A. R. Smith for calculating an approximate hue value is expressed as the fol- 
lowing pseudocode: 
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v = max (R, G, B) 
w = min(R, G, B) 
c = v- w 
rl =(v-R)/c 
gl=(v-G)/c 
bl =(v-B)/c 
if (R = = v) 
if(G = = w) 
h = 5+bl 
else 

h=l-gl 
else if (G = = v) 
if(B = = w) 
h =1 +rl 
else 

h = 3-bl 
else 
if(R = = w) 
h = 3+gl 
else 

h = 5-rl 
h = h/6 



[0323] In order to calculate the color dissonance one must first determine which hues, as illustrated in Figure 60, 
are present with sufficient strength to matter. For each object on the page, calculate its color saturation and area as 
described above. Lightly saturated objects should not contribute strongly. One way to carry this out is to compare the 
saturation to a threshold and ignore objects with insufficient saturation (i.e. c, must be greater than Tc where Tc is the 
threshold). 

[0324] Another approach is to weight the object area by saturation as in A,' = A, * c,. Other variations such as raising 
the saturation to a power before using it to weight the area are possible. 

[0325] The identified colored areas can be summed across all the objects in order to determine how much area in 
each hue can be seen on the page. The areas can be collected in a table H of n possible hue buckets by means of a 
pseudocode expression such as: H[n*hj] = H[n*hj] + Aj' where hj is the hue of the i th object and A,' is its weighted area. 
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[0326] To determine the color dissonance, compare every color hue found with every other color hue found. That is, 
compare all of the colors represented by the H table to one another. The H table tells the amount of area seen in each 
color hue and can be used to ignore cases where the total area of a color is too small to worry about. An alternative 
to collecting the colors for the objects on a page is to compare the color of each object with the color of its neighbors. 
Regardless of which method is used, the results from all comparisons must somehow be combined. A simple way of 
doing this is to just keep the maximum dissonance value encountered. A pseudocode example is as follows: 

Vd = 0 



for i from 1 to n 
{ forj from i ton 

{ dissonance = calculateDissonance(i, j, H[i], H[j]) 
if dissonance > Vd 
Vd = dissonance 

} 

The calculateDissonance function might look as follows: 

calculateDissonance(iJ, ai, aj) 

{ 

if ai > bigEnough and aj > bigEnough 

return dissonanceTable[j - i] 
otherwise 
return 0 

} 



where bigEnough is a threshold value used to ignore small areas of color and dissonance table is a table of color 
dissonance values. 

[0327] Using a table allows any desired function shape to be used; however direct calculation of the dissonance 
value is also possible. The dissonance table captures the model of color harmony and dissonance. A simple model is 
that the harmony of colors only depends on their hue difference and not the absolute hues themselves. Using this 
model, the dissonance table need only be indexed with the hue difference. An example of such a model is colors with 
hue angles that are similar (near 0 degrees apart) or opposite (1 80 degrees apart) or a third of the way round the hue 
circle (120 degrees apart) are considered harmonious while other hue angle differences are dissonant. The values 
stored in the dissonance table would look similar to those depicted graphically in Figure 96. 

[0328] Alternative calculations are possible. For example, one might weight each dissonance look-up by the product 
of the areas of the two color hues being considered and sum this weighted dissonance result over all comparisons. 
This sum should be normalized by dividing by the sum of all area products (without the dissonance result factored in). 
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This calculation gives more of and overall average dissonance measure instead of a maximum dissonance. The par- 
ticular methods for evaluating content color dissonance provided herein are exemplary and are not to be considered 
as limiting in scope. Other methods for determining color dissonance should be considered within the scope of the 
present invention, for example, a function of measured human responses to differing amounts and types of color; such 
5 that the present invention is directed to not only in the particular method of determining color dissonance but also in 
the concept of using a color dissonance measure in the context of evaluating document eye-catching ability and doc- 
ument quality. 

[0329] In a preferred embodiment of the present invention, another mechanism for catching the eye is to use large 
fonts. This makes the text readable from a distance and gives it a feeling of importance. This mechanism can be used 
10 when the document is presented in black and white. It is the maximum font size that is important here (not the average). 
It can be found by stepping through all the fonts used (or stepping through all the text and finding the fonts) and keeping 
track of the largest. The maximum font size found should be converted to a number between 0 and 1 for combination 
with the other measures. 

[0330] A way to do this is as follows: V f =f /(fn+f) where f is the maximum font size found and fn is close to the typical 
15 font size found in documents (e.g. 8 or 10 point). 

[0331] One can also consider weighting the largest font by a function the number of characters. However, while 
increasing the number of characters may make the document more eye-catching when only a few characters are 
present, the effect may diminish for large numbers of characters. 

[0332] The impact of font size can be calculated by considering all of the fonts within a document simultaneously, 
20 however, an alternative would be to determine the impact of each page separately and then to combine the results of 
the pages. Combining page results could be done by a simple average, and this may be appropriate for documents 
such as presentations. However, for many documents it is sufficient for only one page to be eye-catching (e.g. the 
cover page) and it may be better to employ a non-linear combining method that gives a high score if any of the individual 
page contributions are high. Or alternatively, one might use a weighted average where the first page is weighted higher 
25 than the other. 

[0333] The particular methods for evaluating font size impact provided herein are exemplary and are not to be con- 
sidered as limiting in scope. Other methods for determining font size should be considered within the scope of the 
present invention, for example, a function of measured human responses to differing sizes and types of fonts; such 
that the present invention is directed to not only in the particular method of determining a font size measure but also 
30 in the concept of using font and font size measures in the context of evaluating document eye-catching ability and 
document quality. 

[0334] In a preferred embodiment of the present invention, page that is densely packed with information will typically 
require that information to be small and uniform and unlikely to catch the eye. This is not as hard-and-fast an indicator 
as color or font size because the information might, for example, be presented as a mixture of easy to ignore small 
35 black text and eye-catching large colored text. Never the less, one can use the information lightness (the inverse of 
information density) as another clue as to the documents eye-catching behavior. 

[0335] For text, a rough measure of the information present is just the number of characters Nc used to encode the 
information. One might also consider alternative measures such as a count of the number of words. 
[0336] For graphic figures, one can count the number of primitive graphical constructs (lines, rectangles, circles, 
40 arcs, strokes, triangles, polygons, etc.) used to build the figures. The count of graphic constructs Ng may be multiplied 
by a scaling value to normalize it with respect to the text measure. 

[0337] Estimating the information content of pictorial images Np is more problematical. One simple approach is to 
just include a constant information estimation value for each image. 

[0338] An alternative approach is to sum the variance of the pixel values from their neighborhood values and divide 
45 by the image area. Other schemes can also be used to estimate the information found in pictures. This estimate may 
also require a scaling factor to match its measure to that for text. The total information would then be: Nt = Nc + sg Ng 
+ sp Np 

[0339] The information density is the total information divided by the area of the document: Id = Nt / Ad 

[0340] To convert this to a number ranging between 0 and 1 one can again employ the following method: V id = Id / 

50 (a + Id) where a is a constant on the order of the typical information density value. 

[0341 ] One can define the information lightness as the inverse of the information density as calculated by: V H = 1 - V id 
[0342] The particular methods for evaluating information density and lightness provided herein are exemplary and 
are not to be considered as limiting in scope. Other methods for determining information lightness should be considered 
within the scope of the present invention, for example, a function of measured human responses to differing amounts 

55 and areas of information; such that the present invention is directed to not only in the particular method of determining 
information density or lightness, but also in the concept of using information lightness measures in the context of 
evaluating document eye-catching ability and document quality. 

[0343] In a preferred embodiment of the present invention, pictures are more eye-catching than pure text. That is 
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why there are pictures on paperback-book covers that are intended to attract viewers to purchase them, but only simple 
text inside to convey the story. Of course, not all pictures are equally interesting, and for a true measure of a picture 
eye-catching ability, some analysis of the picture content would be necessary. Still, the mere presence of any pictures 
in a document is generally an indicator of greater eye-catching ability. A simple measure of this is the fraction of the 
document area devoted to pictorial images Ap. A normalized measure is: Vp = Ap /Ad 

[0344] The particular method for evaluating picture fraction provided herein is exemplary and is not to be considered 
as limiting in scope. Other methods for determining picture fraction should be considered within the scope of the present 
invention, for example, a function of measured human responses to differing amounts of pictorial information; such 
that the present invention is directed to not only in the particular method of determining picture fraction but also in the 
concept of using a picture fraction measure in the context of evaluating document eye-catching ability and document 
quality. 

[0345] In a preferred embodiment of the present invention, another indicator of how eye-catching a document is its 
novelty, that is, the presence of the unexpected or unconventional. Of course, to tell if something is unexpected or 
unconventional, one must first have some model of what is expected or conventional. Such models can be quite so- 
phisticated and can include such factors as the type of document and its anticipated use. However, the use of novelty 
is illustrated with a simple model. That model is a single typical value expected for each style parameter. 
[0346] Style parameters are the available choices that govern the appearance and presentation of the document. 
They can include the presence of backgrounds and borders, the thickness of borders and rules, paragraph indentation 
and separation, list indentation, list bulleting, font style, font weight and so on. Style parameters also include font size 
and color selections, which were considered separately above. 

[0347] It is believed that it is proper to include color and font size in the estimation of novelty for completeness, but 
that they should also be singled out in the calculation of eye-catching ability since their contribution in this respect is 
much greater than would be explained by unconventionality alone. 

[0348] In the simple model each style parameter Pj has an anticipated value P0j. For any style parameter, but par- 
ticularly for parameters with binary (or enumerated) choices, one can simply add in a constant novelty contribution nj 
if the actual style Pj does not match the expected value P0j. More sophisticated calculations are possible; for example, 
when the style parameter can vary continuously from the expected value (as perhaps in the case of rule width or font 
size). A function of the style difference can be calculated as the novelty contribution: n, = F(P, - P0,) 
[0349] For enumerated style values one can employ a table look-up to yield more flexibility and control over the 
novelty contribution, n, = T[P,] 

[0350] The overall document novelty can be found by taking the average of the novel contributions for all style set- 
tings. Thus if the document had m style choices, the average novelty would be: Vn = E nj / m 

[0351] The expected values P0j can be set a priori, or preferably can be found by examining the style settings of 
typical documents. If they are determined by analyzing documents, the analysis can be conducted on an on-going 
basis and they can be allowed to adapt to the current typical document style. 

[0352] In more sophisticated models, the expected style value may depend upon the location of the content item 
within the document's logical structure. Thus, the expected font style for a heading might be weighted differently from 
the expected setting for the body text. But however it is calculated, novelty can provide a clue as to the documents 
ability to catch the eye. 

[0353] The particular methods for evaluating novelty provided herein are exemplary and are not to be considered 
as limiting in scope. Other methods for determining novelty should be considered within the scope of the present 
invention, for example, a function of measured human responses to differing styles. The present invention lies not only 
in the particular method of determining picture fraction but also in the concept of using a novelty measure in the context 
of evaluating document eye-catching ability and document quality. 

INTEREST 

[0354] A property of a document contributing to its quality that is similar to its eye-catching ability is the ability of the 
document to hold attention and interest. While a major contributor to the interest of a document is its subject matter, 
the presentation of that subject matter (the style and format) can affect the interest level as well. This invention provides 
a method of calculating an interest measure for the style and format decisions, calculated as a combination of simpler 
factors that contribute to interest. If any of the simpler interest factors is strongly present, then the overall effect is an 
interesting document. 

[0355] Factors can include variety, change rate, emphasis, graphic fraction colorfulness, color dissonance, picture 
fraction, and/or novelty. Calculation methods are defined for each of these factors and each are designed to produce 
a value ranging between 0 and 1, such that 0 means low or bad interest value, and 1 means high or good interest 
value. These (and possibly other such factors) can be calculated and combined to form an overall interest measure 
Vi. The separate factors can be combined by a method similar to that described above for the eye-catching ability 
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property. 

[0356] The particular methods for evaluating ability of the document to maintain interest provided herein are exem- 
plary and are not to be considered as limiting in scope. Other methods for determining how well the document maintains 
interest should be considered within the scope of the present invention, for example, a function of measured human 
5 responses to differing document characteristics with respect to the ability to maintain interest; such that the present 
invention is directed to not only in the particular method of determining the ability to maintain interest, but also in the 
much broader concept of using a combination of individual measures in the context of evaluating document interest 
and document quality. 

[0357] As illustrated in Figure 62, another parameter or factor used in determining interest is the measurement and 

10 quantization of the document's variety. 

[0358] In a preferred embodiment of the present invention, one way to make a document interesting to look at is to 
include a variety of styles in its presentation. Style parameters are the available choices that govern the appearance 
and presentation of the document. They can include the presence of backgrounds and borders, the thickness of borders 
and rules, paragraph indentation and separation, list indentation, list bulleting, font style, font weight, font size, color 

15 selections and so on. 

[0359] Style parameters can be grouped and associated with the logical structure of the content. For example, style 
parameters associated with a text string include the font family, font size, font style, font weight, and color. 
[0360] Style parameters associated with a paragraph include the indentation, line length, line spacing, before and 
after spacing and quadding. 

20 [0361] Style parameters associated with lists include left and right list indentation, bullet or numbering style, and 
bullet positioning. 

[0362] In determining variety of style one is counting the number of styles present in the document, but this raises 
the question of just what constitutes a different style. Should style parameters be considered individually or as a group? 
[0363] For example, if a document contains a 12-point bold weight font and a 10-point normal weight font, is that 
25 four styles (two sizes plus two weights) or just two styles (two fonts)? The answer for the preferred embodiment is two 
and the styles should be considered in combination. 

[0364] But this still leaves the question of what combinations should be considered. If the 12-point bold is used in a 
list without bullets, and the 1 0-point normal is used in a list with bullets, is this still only two styles, or should the list 
styles and font styles be considered independently? This answer is less clear. 
30 [0365] But, if one considers the correct grouping to be the entire set of style parameters so that whenever any style 
parameter changes a new overall style is generated, there is the potential of a combinational explosion of style in- 
stances. While this approach is not ruled out, the preferred method is to group the style parameters according to their 
associated content type (i.e. text styles, paragraph styles, graphic styles, list styles, table styles, content element back- 
ground styles etc.). 

35 [0366] Thus, in the above example, one would have two text styles and two list styles for four style choices in the 
document. This approach also avoids the problems arising from the growth of style parameters from the hierarchical 
structure of a document. If the document contains lists of lists of lists, the preferred approach gives three instances of 
the simple list style group instead of some new large group containing all the style choices of the structure. 
[0367] To estimate the style variety, first decide what style parameters and parameter groups to include in the anal- 

40 ysis. For example, one might decide to consider just the text, paragraph, and graphic styles. For text, consider font 
family, size, weight, style and color. For graphics, consider fill color, edge color and edge thickness. For paragraphs, 
consider line length, line spacing, quadding, and first-line indentation. 

[0368] Three lists are constructed, one for each type of style group. The list elements contain the values of the style 
parameters for that group. One then steps through the document's logical structure, examining each logical element 
45 being analyzed for the style setting (in this example each text segment, graphic element and paragraph.) One considers 
the style parameter settings of each logical content element and checks the corresponding list to see if an entry has 
been made with a matching set of values. 

[0369] If a matching list entry is found, nothing more need be done for this content element. If, however, the list does 
not contain a match, a new list element containing the new set of style values should be constructed and added to the list. 

50 [0370] At the end of the document analysis, the lists should contain all of the style parameter combinations that were 
discovered. One can then simply count the number of list elements to determine the number of styles used. The sizes 
of all the lists should be combined into an overall style count. One can weight the list sizes when adding them together 
if one wishes to make the variety of one form of content count more than that of another (for example, one might make 
variety in paragraph style count more than variety in graphics). The result would be an overall weighted count of style 

55 changes s: s = £ w x s x where s x is the size of the x th style list and w x is the weight. 

[0371] In order to combine the style variety measure with the other contributions to interest, this weighted count 
should be converted to a number ranging between 0 and 1 . This can be done as follows: V v =s/(a s +s) where V v is the 
variety measure and a s is a constant value about the size of the expected number of styles in a typical document. 
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Figure 62 is an example of high variety. 

[0372] The particular methods for evaluating the variety of the document content and style provided herein are ex- 
emplary and are not to be considered as limiting in scope. Other methods for determining variety should be considered 
within the scope of the present invention, for example, a function of measured human responses to differing document 
5 characteristics with respect to variety; such that the present invention is directed to not only in the particular method 
of determining variety, but also in the much broader concept of using variety measures in the context of evaluating 
document interest and document quality. 

[0373] As illustrated in Figure 63, another parameter or factor used in determining interest is the measurement and 
quantization of the document's change rate. 

10 [0374] In a preferred embodiment of the present invention, it is not only the variety of styles in a document that holds 
interest, but also the rate at which the style changes. There may only be two style combinations represented, but if the 
document is frequently switching back and forth between them, it is more interesting than if it changes only once. 
[0375] Calculating the style change rate is similar to calculating the style variety as described above, and uses the 
same style parameters and groupings. However, one need only to maintain for a single description of the most recently 

15 encountered style parameter set for each group (instead of a list of all previously encountered sets). For example, 
there would be a single set of most recently encountered text style parameters, a single set of the graphic style pa- 
rameters and a set of the most recently encountered paragraph parameters. Step through the document's logical 
description and examine the style settings. Whenever a content element has style parameters that differ from those 
seen most recently, a count of the changes for that style group is incremented, and the new set of style values for use 

20 with the next content element is remembered. In a manner similar to the variety calculation, the change counts can be 
weighted and combined to form a total weighted change count c. c = E w x c x where c x is the size of the x th style group 
change count and w x is the weight. 

In order to combine the style change rate measure with the other contributions to interest, this weighted count should 
be converted to a number ranging between 0 and 1 . This can be done as follows: V ch = c / (a ch + c) where V ch is the 
25 variety measure and a ch is a constant value about the size of the expected number of style changes in a typical doc- 
ument. Figure 63 is an example of high change rate. 

[0376] The particular methods for evaluating the change rate of the document style provided herein are exemplary 
and are not to be considered as limiting in scope. Other methods for determining change rate should be considered 
within the scope of the present invention, for example, a function of measured human responses to differing document 
30 style characteristics with respect to perceived change rate; such that the present invention is directed to not only in 
the particular method of determining change rate, but also in the much broader concept of using change rate measures 
in the context of evaluating document interest and document quality. 

[0377] In a preferred embodiment of the present invention, some font styles are chosen to emphasize the text. Large 
text, bold text, and underscored text all have an implied importance over the normal text presentation. This implied 
35 importance tells the reader to wakeup and pay attention. As such, it has a special contribution to the maintenance of 
viewer interest. One can calculate an average emphasis measure for the text in a document by summing an emphasis 
value for each character and then dividing by the total number of characters. V e = E e(t) / nc where V e is the emphasis 
measure, e is the emphasis function for character t, the sum is over all characters and nc is the total number of char- 
acters. 

40 [0378] The function e(t) should include factors for the size of the text, its weight, its variant and its contrast (other 
factors such as font style might also be included). The larger the font size, the greater the emphasis, but one would 
like to have a factor that ranges between 0 and 1 . An expression such as size(t)/(a fs + size(t)), where a fs is a constant 
about the size of a typical font, will do this. The font weight (e.g. light, normal, bold, heavy) is typically an enumerated 
value and a table of suitable emphasis factors for each weight ew[weight(t)] can be used in the emphasis function. 

45 Similarly, the font variant (e.g. normal, underlined, strikethrough, outlined) can be handled as a table look-up such as 
ev[variant(t)]. 

[0379] Contrast also plays a role in the strength of text emphasis. Text with low contrast to the background will not 
have the same degree of impact as high contrast text. The luminance contrast can be calculated as described above 
as 2| Yb - Yf | / (Yb + Yf) where Yb is the luminance of the background and Yf = Lum(t) is the luminance of the text. 
50 An example of an emphasis function is then: 

e(t) = (size(t)/(a fs + size(t))) ew[weight(t)] ev[variant(t)] (2 | Yb - Lum(t) 
55 I / (Yb + Lum(t))) 

[0380] Note that one might also include other characteristics such as the font style (e.g. italic). The particular methods 
for evaluating emphasis provided herein are exemplary and are not to be considered as limiting in scope. Other methods 
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for determining emphasis should be considered within the scope of the present invention, for example, a function of 
measured human responses to differing document style characteristics with respect to emphasis; such that the present 
invention is directed to not only in the particular method of determining emphasis, but also in the much broader concept 
of using emphasis measures in the context of evaluating document interest and document quality. 
5 [0381] As illustrated in Figure 64, another parameter or factor used in determining interest is the measurement and 
quantization of the document's graphical fraction. 

[0382] In a preferred embodiment of the present invention, graphical constructs are often used to explain or illustrate 
concepts and ideas. They also add variety to the content. As such, graphics can make a document more interesting, 
and so, a measure of the graphical content should contribute to the estimation of how interesting the document is. 
10 [0383] One simple measure of the graphical contribution is just a count of the graphical content objects encountered 
in the document. 

[0384] An alternative approach is to sum the areas of the bounding boxes that enclose each of the graphical content 
objects encountered. This sum can then be divided by the total area of the document to yield a number ranging between 
0 and 1 . 

15 [0385] A third approach is to examine the graphical content objects in greater detail and to count the primitive drawing 
objects such as lines, curves, rectangles, polygons and ellipses from which they are constructed. This approach gives 
a better measure of the complexity of the graphic and possibly a better measure of how interesting that graphic is. The 
counts for the various drawing primitives can be weighted to indicate how interesting that drawing primitive is (for 
example, an ellipse might be considered more interesting than a rectangle) and summed to give an overall weighted 

20 graphic count: g = Z w x g x where g x is the count of the x th type of graphic construct and w x is the weight. 

[0386] In order to combine the graphic fraction measure with the other contributions to interest, this weighted count 
should be converted to a number ranging between 0 and 1 . This can be done as follows: V g = g / (a g + g) where V g is 
the variety measure and a g is a constant value about the size of the expected number of graphic drawing primitives in 
a typical document. 

25 [0387] An alternative is to divide the count of graphic drawing primitives, by a count of the total drawing primitives 
N dp in the document (including characters and images). This approach removes the dependence on the document 
size. V g = g / N dp . Figure 64 illustrates an example of a high graphical fraction. 

[0388] The particular methods for evaluating graphic fraction provided herein are exemplary and are not to be con- 
sidered as limiting in scope. Other methods for determining graphic fraction should be considered within the scope of 
30 the present invention, for example, a function of measured human responses to differing document style characteristics 
with respect to emphasis; such that the present invention is directed to not only in the particular method of determining 
graphic fraction, but also in the much broader concept of using graphic fraction measures in the context of evaluating 
document interest and document quality. 

[0389] Several of the factors that attract attention and catch the viewer's eye, will also serve to hold the attention 
35 and interest. One can list the properties of colorfulness, color dissonance, picture fraction, and novelty as examples 
of this joint use. The difference in behavior between attention and interest is one of relative importance or weight. 
Colorfulness, for example, can be very important in catching the eye, but less important in maintaining interest. Novelty, 
on the other hand, can be more important to maintaining interest than it is to capturing attention. Methods for estimating 
the strength of these four measures were described above. 
40 [0390] The particular methods for evaluating colorfulness, color dissonance, picture fraction, and novelty provided 
herein are exemplary and are not to be considered as limiting in scope. Other methods for determining these measures 
should be considered within the scope of the present invention, for example, a function of measured human responses 
to differing document color, picture and style characteristics with respect to the measures; such that the present inven- 
tion is directed to not only in the particular method of determining the measures, but also in the much broader concept 
45 of using colorfulness, color dissonance, picture fraction, or novelty measures in the context of evaluating document 
interest and document quality. 

[0391] A combination of measures, as illustrated in Figure 61, is useful in evaluating the document's interest. 
[0392] More specifically, the interest, as illustrated in Figure 61 , is considered a combination of variety, change rate, 
emphasis, graphic fraction, colorfulness, color dissonance, picture fraction, and/or novelty. In Figure 61 , the quantized 
50 interest value is derived by a combining of the variety, change rate, emphasis, graphic fraction, colorfulness, color 
dissonance, picture fraction, and/or novelty using an interest quantizer or combiner circuit 40. 

[0393] It is noted that the illustration shows a circuit for the eye-catching ability quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 

55 

COMMUNICABILITY 

[0394] Another factor contributing to the quality of a document design is how well that design aids in communicating 
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the information contained within the document to the user. The present invention provides a method of calculating such 
a communicability measure. 

[0395] As with aesthetics and ease-of-use, the approach to quantifying communicability is to evaluate factors iden- 
tified as contributing to the effectiveness of the communication. These factors are then combined to form a composite 
5 measure. The factors contribute to the quality of the document design. If any of the simpler communicability factors is 
absent, then the overall ability of the document to communicate is reduced. 

[0396] Component factors can include legibility, information lightness, technical level, text and image balance, red- 
green friendliness, ease of progression, and/or ease of navigation. Each factor can be defined such as to produce a 
value ranging between 0 and 1 , where 0 means low or bad communicability value and 1 means high or good commu- 
10 nicability value. These, (and possibly other such factors), can be calculated and combined to form an overall commu- 
nicability measure in a manner similar to that described above for aesthetics. If V, is the value calculated for the i th rule, 
then the communicability measure V CM is formed as a function E of these contributions: V CM = E(V L , V ih V th V tib , V rg , 

^ep ■■■ ^en) 

[0397] The combining function E can be as simple as a weighted average of the contributions, but because any bad 
15 contributor can lead to a poor communicating document, no matter how good the others are, a linear combination is 
not preferred. An alternative is to use: V CM = (E w, (d + Vj)"P )~ 1/ p - d Here d is a number slightly larger than 0. The 
closer the value of d to 0, the more strongly a bad value will cancel all other values. The w, factors are the weights that 
specify the relative importance of each rule; they should sum to 1 . The exponent p introduces a nonlinearity that can 
also increase the strength by which one bad value can overwhelm many good ones. The larger p is, the greater this 
20 effect. 

[0398] Other combining functions are possible as mentioned above. The particular methods for evaluating ability of 
the document to communicate provided herein are exemplary and are not to be considered as limiting in scope. Other 
methods for determining how well the document communicates should be considered within the scope of the present 
invention, for example, a function of measured human responses to differing document characteristics with respect to 
25 the ability to communicate; such that the present invention is directed to not only in the particular method of determining 
the ability to communicate, but also in the much broader concept of using a combination of individual measures in the 
context of evaluating document communicability and document quality. 

[0399] A combination of measures, as illustrated in Figure 65, is useful in evaluating the document's communicability. 
[0400] More specifically, the communicability, as illustrated in Figure 65, is considered a combination of legibility, 
30 information lightness, technical level, text and image balance, red-green friendliness, ease of progression, and/or ease 
of navigation. In Figure 65, the quantized communicability value is derived by a combining of the legibility, information 
lightness, technical level, text and image balance, red-green friendliness, ease of progression, and/or ease of navigation 
using a communicability quantizer or combiner circuit 50. 

[0401] It is noted that the illustration shows a circuit for the communicability quantization process, this process may 
35 also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0402] It is further noted that a combination of measures, as illustrated in Figure 66, is useful in evaluating the doc- 
ument's legibility. 

[0403] More specifically, the legibility, as illustrated in Figure 66, is considered a combination of decipherability, line 
40 retrace, relative line separation, and/or quadding. In Figure 66, the quantized legibility value is derived by a combining 
of the decipherability, line retrace, relative line separation, and/or quadding using a legibility quantizer or combiner 
circuit 51. 

[0404] It is noted that the illustration shows a circuit for the legibility quantization process, this process may also be 
performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, but 

45 any combination of software and/or hardware that is able to carry out the below described methodologies. 

[0405] In a preferred embodiment of the present invention, one of the first and foremost factors in estimating a doc- 
ument's communication effectiveness is the legibility of its text. Legibility measures the ease of following and recog- 
nizing the words of the document when reading. Legibility is itself a property that can be broken down into contributing 
components. As noted above, chief among these components is decipherability, line retrace, relative line separation, 

50 and/or quadding. Other factors that might also be considered include the word and character spacing and the use of 
hyphenation. 

[0406] A combination of measures, as illustrated in Figure 67, is useful in evaluating the document's decipherability. 
[0407] More specifically, the decipherability, as illustrated in Figure 67, is considered a combination of display device 
properties, font, character familiarity, and/or luminance contrast. In Figure 67, the quantized legibility value is derived 
55 by a combining of the display device properties, font, character familiarity, and/or luminance contrast using a decipher- 
ability quantizer or combiner circuit 52. 

[0408] It is noted that the illustration shows a circuit for the decipherability quantization process, this process may 
also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
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but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0409] In a preferred embodiment of the present invention, decipherability, the most complex of the legibility factors, 
measures the ability to recognize the letter shapes. It can itself be further broken down into simpler pieces. As noted 
above, factors that contribute to the decipherability include the display device, the font, the character familiarity, and/ 

5 or the luminance contrast. 

[0410] The properties of the display device and the font may often be considered together; that is, one determines 
how decipherable a particular font is on a particular device. For example, fonts with serifs are, as a rule, easier to 
decipher than san serif fonts; but on a device that cannot effectively produce serifs, this may not be true. The font 
family, font size, font weight, font style, and font variant all can contribute to the decipherability. 

10 [0411] An approach to dealing with the effect of font specification and device choice is to measure by experiment 
the decipherability (the ability to correctly determine the character presented) for a fully specified font on a particular 
device. This measurement can then be handled as a font property. Given the font specification one can then look up 
the font's decipherability contribution in a font table (df = DF[font specification]). 

[041 2] If the font is to be displayed on the same type of device as was used for the measurement, the font contribution 
15 will not require further adjustment for the device. However, if a different display device type is used, then some sort of 
adjustment is needed. For example, fonts are, in general, much more decipherable when printed on paper than when 
presented on a CRT display. An example of an adjustment to the font decipherability is to multiply it by an adjustment 
factor ad for the display device. 

[0413] One way to determine the adjustment factor is as a function of the smallest font size that the device is capable 
20 of effectively presenting. The function could, for example, be the ratio of the smallest effective text size for the device 
used in measuring the font decipherability to the smallest effective text size for the display to actually be used. For 
example, if the font properties were measured on a CRT that could effectively display only 8-point or larger fonts, but 
was to be printed on paper that could support 4-point fonts or larger, then the device adjustment factor should be 2. 
One may wish to adjust this factor according to the font size actually used because the effect of the display may be 
25 less important for large text. 

[0414] The ease in correctly deciphering a character depends upon the familiarity with it. Reading all caps is harder 
than reading normal text. Numbers and punctuation characters each have their own degree of difficulty. Thus, another 
adjustment factor ac for the familiarity of a character should be multiplied in. This adjustment factor can be found from 
a table indexed by the character code. 
30 [041 5] The contrast of the character with the background also contributes to the decipherability. It is harder to decipher 
light yellow characters on a white background than to decipher black ones. A third adjustment factor is the luminance 
contrast that can be calculated as was described above for locatability: al = 2 | Yb - Yt | / (Yb + Yt) where Yb is the 
luminance of the background and Yt is the luminance of the text. 

[0416] The overall decipherability for a character is therefore given by: dc = df ad ac al 
35 [041 7] An average overall decipherability d, for a string of text, can be found by finding the sum of the decipherability 
measures for each character in the string and then dividing by the total count of characters in the string. 
[0418] The particular methods for evaluating decipherability provided herein are exemplary and are not to be con- 
sidered as limiting in scope. Other methods for determining decipherability should be considered within the scope of 
the present invention, for example, a function of measured human responses to differing document text characteristics 
40 with respect to decipherability; such that the present invention is directed to not only in the particular method of deter- 
mining decipherability, but also in the much broader concept of using decipherability measures in the context of eval- 
uating document legibility, communicability and document quality. 

[0419] As illustrated in Figure 68, another parameter or factor used in determining legibility is the measurement and 

quantization of the document's line retrace. 
45 [0420] In a preferred embodiment of the present invention, the second factor contributing to text legibility is the length 

of the text lines. There is some cost in moving the eye from the end of one line to the start of the next, but the cost 

increases with the length of the line. This cost is included by multiplying the decipherability by a line retrace factor r. 

An example of a function that can be used for this factor is: r = B / (n 2 + B) where B is a constant (with value on the 

order of 3600) and n is the average number of characters per line. 
50 [0421] In Figure 68, the retracing of the group of lines 1101 makes it more difficult for the reader to find the next line 

due to the long length of the text line. On the other hand, in Figure 68, the retracing of the group of lines 1102 makes 

it easier for the reader to find the next line due to the short length of the text line. 

[0422] The particular methods for evaluating line retrace characteristics with respect to legibility provided herein are 
exemplary and are not to be considered as limiting in scope. Other methods for determining line retrace effects on 
55 legibility should be considered within the scope of the present invention, for example, a function of measured human 
responses to differing document text line characteristics with respect to line retrace and legibility; such that the present 
invention is directed to not only in the particular method of determining line retrace characteristics, but also in the much 
broader concept of using line retrace measures in the context of evaluating document legibility, communicability and 
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document quality. 

[0423] As illustrated in Figure 69, another parameter or factor used in determining legibility is the measurement and 
quantization of the document's relative line separation. 

[0424] In a preferred embodiment of the present invention, the third contribution to legibility is the relative line sep- 

5 aration. Increasing the separation between line acts to improve legibility. It makes it easier for the eye to track correctly 
from the end of a line to the start of the next line. The effect of lines separation is included by means of a line separation 
factor s. An example of a function that can be used is as follows: s = y / (y + g) where g is a constant (e.g. 0.1) that 
controls how legibility improves with line separation, and y is a biased relative separation defined by: y = (hl_ - hf) / hf 
+ bs where hl_ is the height of the line (baseline to baseline) hf is the height of the font and bs is a small biasing term 

10 (e.g. 0. 1 ) to indicate just how far lines must overlap before they become unreadable. 

[0425] In Figure 69, the relative line separation of the group of lines 1101 makes it more difficult for the reader to 
find the next line due to the closely packed text lines. On the other hand, in Figure 69, the relative line separation of 
the group of lines 1 102 makes it easier for the reader to find the next line due to the widely spaced text lines. 
[0426] The particular methods for evaluating relative line separation effect on legibility provided herein are exemplary 

15 and are not to be considered as limiting in scope. Other methods for determining line separation effects should be 
considered within the scope of the present invention, for example, a function of measured human responses to differing 
document text line spacing characteristics with respect to relative line spacing and legibility; such that the present 
invention is directed to not only in the particular method of determining line spacing, but also in the much broader 
concept of using line spacing measures in the context of evaluating document legibility, communicability and document 

20 quality. 

[0427] As illustrated in Figures 70 to 73, another parameter or factor used in determining legibility is the measurement 
and quantization of the document's quadding. 

[0428] In a preferred embodiment of the present invention, legibility is also affected by the quadding (i.e. the alignment 
and justification of the text). Left-aligned unjustified text is easiest to read, and justified text is almost as easy. Center- 
25 aligned text is more difficult and right aligned is the hardest of all. A factor for the effect of the quadding can be stored 
in the table and looked up for the legibility calculation of text t. q = Q[quadding(t)] 

[0429] The particular methods for evaluating contribution from quadding to legibility provided herein are exemplary 
and are not to be considered as limiting in scope. Other methods for determining the quadding contribution should be 
considered within the scope of the present invention, for example, a function of measured human responses to differing 

30 document text quadding choices with respect to legibility; such that the present invention is directed to not only in the 
particular method of determining the quadding contribution, but also in the much broader concept of using quadding 
measurements in the context of evaluating document legibility, communicability and document quality. 
[0430] The complete legibility calculation is then given by: V L = d r s q 
[0431] This gives the legibility for a particular text element such as a paragraph. 

35 [0432] To arrive at a legibility measurement for an entire document, one must measure the legibility of each paragraph 
and then combine them. Combining can be done by a simple average, but it may be preferred to use a non-linear 
method such that a low legibility score on any paragraph can result in a lower overall score that would be obtained by 
a simple average. Methods such as the root of the average of powers that have been described can be used to achieve 
this effect. 

40 [0433] Figure 70 illustrates an example of a left aligned document. Figure 71 illustrates an example of a right aligned 
document. Figure 72 illustrates an example of a center aligned document. Figure 73 illustrates an example of a justified 
document. 

[0434] The particular methods for evaluating document legibility provided herein are exemplary and are not to be 
considered as limiting in scope. Other methods for determining the document legibility should be considered within the 
45 scope of the present invention, for example, a function of measured human responses to differing text characteristics 
with respect to legibility; such that the present invention is directed to not only in the particular method of determining 
the legibility, but also in the much broader concept of using a combination of individual measures in the context of 
evaluating document legibility, communicability and document quality. 

[0435] In a preferred embodiment of the present invention, it takes time to decipher text and to understand the con- 
50 cepts. In general, a short road sign communicates more effectively than a long one. The information lightness, (the 
inverse of information density), of a document is included as another factor in how well it communicates. This factor 
is not nearly as important as legibility and is weighted accordingly. 

[0436] A method for calculating information lightness was described in the discussion of eye-catching ability. 
[0437] The particular methods for evaluating information density and lightness provided herein are exemplary and 
55 are not to be considered as limiting in scope. Other methods for determining information lightness should be considered 
within the scope of the present invention, for example, a function of measured human responses to differing amounts 
and areas of information; such that the present invention is directed to not only in the particular method of determining 
information density or lightness, but also in the concept of using information lightness measures in the context of 
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evaluating document communicability and document quality. 

[0438] In a preferred embodiment of the present invention, the ease with which a document communicates also 
depends upon the audience for which it was designed. A child's book will probably be easier to follow than a technical 
manual. The technical level is a measure that estimates this intended degree of sophistication. It can be composed 
5 from simple measures that can include reading ease, number fraction, and/or picture fraction. The presence of graphic 
constructs may also have an effect on the technical level, but it is unclear at this time whether the effect is to increase 
or decrease it. It has therefore not been included in this example measure. 

[0439] A combination of measures, as illustrated in Figure 74, is useful in evaluating the document's technical level. 

[0440] More specifically, the technical level, as illustrated in Figure 74, is considered a combination of reading ease, 
10 number fraction, and/or picture fraction. In Figure 74, the quantized technical level value is derived by a combining of 

the reading ease, number fraction, and/or picture fraction using a technical level quantizer or combiner circuit 53. 

[0441] It is noted that the illustration shows a circuit for the technical level quantization process, this process may 

also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 

but any combination of software and/or hardware that is able to carry out the below described methodologies. 
15 [0442] In a preferred embodiment of the present invention, reading ease is a well-known measure of a document's 

text. An example of a reading ease algorithm is: RE = 206.835 - 0.846 Sy - 1 .01 5 W where Sy is the average number 

of syllables per 1 00 words and W is the average number of words per sentence. 

[0443] For the calculation of technical level one wants a reading difficulty measure, which can be roughly calculated 
as: Rd = 0.85 Sy + W 

20 [0444] The particular methods for evaluating contribution from reading ease to technical level provided herein are 
exemplary and are not to be considered as limiting in scope. Other methods for determining the reading ease contri- 
bution should be considered within the scope of the present invention, for example, a function of measured human 
responses to differing document text elements with respect to reading ease; such that the present invention is directed 
to not only in the particular method of determining the reading ease contribution, but also in the much broader concept 

25 of using reading ease measures in the context of evaluating document technical level, communicability and document 
quality. 

[0445] In a preferred embodiment of the present invention, words are easier to comprehend than numbers; a large 
table of numbers is typically much more difficult to grasp than an equal quantity of words. To capture this, calculate 
the number fraction Fn, measure the ratio of numbers to the total of numbers and words. 

30 [0446] The particular methods for evaluating contribution from number fraction to technical level provided herein are 
exemplary and are not to be considered as limiting in scope. Other methods for determining the number fraction con- 
tribution should be considered within the scope of the present invention, for example, a function of measured human 
responses to differing amounts of numbers with respect to technical level; such that the present invention is directed 
to not only in the particular method of determining the number fraction contribution, but also in the much broader 

35 concept of using number fraction measures in the context of evaluating document technical level, communicability and 
document quality. 

[0447] In a preferred embodiment of the present invention, pictures are used to aid understanding. The use of pictures 
reduces the technical level measure. Picture fraction was defined above as: Fp = Ap / Ad where Ap is the area of the 
pictures and Ad is the total area of the document. 
40 [0448] One actually needs the inverse behavior of the picture fraction, so that as Fp increases, the technical level 
decreases. Using Fnp = 1 - Fp is possible, but a few images can make a big difference in the technical level, while as 
more images are added, the benefits may fall off. Thus a better choice is a nonlinear function such as: Fnp = 1/(ap + 
Fp) where ap is a constant near 1 . 

[0449] The particular methods for evaluating contribution from picture fraction to technical level provided herein are 
45 exemplary and are not to be considered as limiting in scope. Other methods for determining the picture fraction con- 
tribution should be considered within the scope of the present invention, for example, a function of measured human 
responses to differing amounts of pictorial elements in a document with respect to technical level; such that the present 
invention is directed to not only in the particular method of determining the picture fraction contribution, but also in the 
much broader concept of using picture fraction measures in the context of evaluating document technical level, com- 
50 municability and document quality. 

[0450] The technical level measure can then be computed as: Tl = Rd Fn Fnp 

[0451] However, Rd (and therefore Tl) is not limited to range only between 0 and 1. This can be remedied by the 
function: V t) = Tl / (atl + Tl) where atl is a positive constant. 

[0452] The particular methods for evaluating document technical level provided herein are exemplary and are not 
55 to be considered as limiting in scope. Other methods for determining the documenttechnical level should be considered 
within the scope of the present invention, for example, a function of measured human responses to differing document 
characteristics with respect to technical level; such that the present invention is directed to not only in the particular 
method of determining the technical level, but also in the much broader concept of using a combination of individual 
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measures in the context of evaluating document technical level, communicability and document quality. 

[0453] As illustrated in Figures 75 to 77, another parameter or factor used in determining communicability is the 

measurement and quantization of the document's text and image balance. 

[0454] In a preferred embodiment of the present invention, when considering technical level, it was assumed that 
5 the more images, the lower the level (although with diminishing returns). But for communicability, this rule may not 
apply in general. If a document is solely composed of images without any textual explanation it may be difficult to be 
sure of the author's message. A rule of design is that ideally about equal amounts of document area should be devoted 
to text and to illustration. The difference between the areas is a measure of the unbalance, and an inverse can be 
applied to give a balance measure. For example, if the total area devoted to text is At and the total area devoted to 
10 pictures is Ap then a measure of the text and image balance is given by: Vtib =1 -| At - Ap | / ( At + Ap ) 

[0455] Figure 75 illustrates an example of poor text and image balance. Figure 76 illustrates an example of poor text 
and image balance. Figure 77 illustrates an example of good text and image balance. 

[0456] The particular methods for evaluating contribution from text and image balance to communicability provided 
herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the text and 

15 image balance contribution should be considered within the scope of the present invention, for example, a function of 
measured human responses to differing ratios of document text and image elements with respect to communicability; 
such that the present invention is directed to not only in the particular method of determining the text and image balance 
contribution, but also in the much broader concept of using text and image balance measures in the context of evaluating 
document communicability and document quality. 

20 [0457] In a preferred embodiment of the present invention, another aspect of how well a document communicates 
is its ability to serve viewers with handicaps or impairments. An example of this is whether the document can be used 
by the fraction of men who are red-green colorblind. One element of red-green friendliness is checking that an object's 
color and its background color differ by more than just a red-green contrast. Luminance contrast and blue-yellow con- 
trast are the mechanisms by which the colorblind can distinguish the foreground objects from background. Step through 

25 the document examining the foreground and background colors for each object. If a color is specified by its red, green 
and blue components (R, G, B), then the luminance and luminance contrast CY can be calculated as described above. 
[0458] The blue-yellow contrast can be calculated from the S chrominance component, defined as: S = (R + G) / 2 - B 
[0459] The blue-yellow contrast is calculated similarly to the luminance case as: Cby = 2 | Sf - Sb | / (2 + Sf + Sb) 
where Sf and Sb are the foreground and background S chrominance components respectively. 

30 [0460] The red-green friendliness of an object can be estimated by combining the luminance and blue-yellow chromi- 
nance contrast components: Frg = (CY + Cby) / 2 

[0461] A weighted average can also be used to combine the contrast components. 

[0462] For the entire document some mechanism is needed for combining the red-green friendliness values for all 
document objects. One way to do this is to average the values weighted by the corresponding object areas. If Frg, is 
35 the red-green friendliness of the i th object and Aj is its area, then the average would be given by: V rg = (L Frgj Aj) / E 
Aj where the sums are over all objects. 

[0463] However, a single small object or set of objects that are difficult to decipher can have a large impact on the 
overall understanding of thedocument. Thus, some methodotherthatweighting by area maybe preferred forcombining 
friendliness values. An alternative is to look for the minimum value as in: V rg = MIN(Frgj) 
40 [0464] A third approach combines features of the above two methods. The values are weighted by area, but values 
are raised to a power in a way that emphasizes low values. V rg =((E ( drg + Frg,)-P A,) / E Aj) -1/ P - drg where drg is a 
positive constant near zero and p is a positive power 1 or greater. 

[0465] Other methods of combining the friendliness values are also possible. The particular methods for evaluating 
contribution from red-green friendliness to communicability provided herein are exemplary and are not to be considered 

45 as limiting in scope. Other methods for determining the red-green friendliness or other document characteristics that 
support users with handicaps should be considered within the scope of the present invention, for example, a function 
of measured color-blind human responses to differing color with respect to communicability; such that the present 
invention is directed to not only in the particular method of determining the red-green friendliness contribution, or other 
handicap compensation characteristic, but also in the much broader concept of using handicap compensation meas- 

50 ures in the context of evaluating document communicability and document quality. 

[0466] In a preferred embodiment of the present invention, one more property that has a bearing on the communi- 
cability of a document is the ease of progression, as illustrated in Figure 78. Ease of progression measures the difficulty 
in progressing from one document component to the next component in logical order; for example, in moving from the 
bottom of one column to the top of the next. An estimation of the ease of progression is calculated as a composite of 

55 several properties, each of which aids in the progression process. These properties include distinguishability, group 
identity, spatial coherence, list bullets, progression links, headings, alignment, white space, consistency of scan, and/ 
or consistency of order. 

[0467] These contributing factors are combined using a weighted average since they are not all equally important. 
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V ep = w ds V ds + w gi V gi + w sc V sc + w lb V lb + w plk V plk + w hd V hd + w al V al + w ws V ws + w cs V cs + w co V co where the w ' s 

are the weights and the V's are the contributing factors. 

[0468] Note that alternative methods of combination are possible. The particular methods for evaluating document 
ease of progression provided herein are exemplary and are not to be considered as limiting in scope. Other methods 

5 for determining the document ease of progression should be considered within the scope of the present invention, for 
example, a function of measured human responses to differing document characteristics with respect to ease of pro- 
gression; such that the present invention is directed to not only in the particular method of determining the ease of 
progression, but also in the much broader concept of using a combination of individual measures in the context of 
evaluating document ease of progression, communicability and document quality. 

10 [0469] A combination of measures, as illustrated in Figure 78, is useful in evaluating the document's ease of pro- 
gression. 

[0470] More specifically, the ease of progression, as illustrated in Figure 78, is considered a combination of distin- 
guishability, group identity, spatial coherence, list bullets, progression links, headings, alignment, white space, consist- 
ency of scan, and/or consistency of order. In Figure 78, the quantized ease of progression value is derived by a com- 
15 bining of the distinguishability, group identity, spatial coherence, list bullets, progression links, headings, alignment, 
white space, consistency of scan, and/or consistency of order using an ease of progression quantizer or combiner 
circuit 54. 

[0471] It is noted that the illustration shows a circuit for the ease of progression quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 

20 circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0472] In a preferred embodiment of the present invention, the distinguishability indicating how well one can distin- 
guish an element from its neighbors, the group identity property indicating how easy it is to tell which objects belong 
as part of a logical group and which do not, the spatial coherence property that measures how closely packed together 
the members of a group are, and headings that describe the logical structure, were defined above in the discussion of 

25 the group contribution to ease of use. These factors also contribute to how well the document communicates, but with 
weights to reflect different relative importance. Spatial Coherence is singled out here because it has particular relevance 
to ease of progression and one may wish to give its contribution a different weight form that entering via group identity. 
[0473] The discussion of headings measured above combined headings, list bullets and list numbers all as one 
measure, but one can leave out the checks for list bullets and numbers and adapt the method to look at headings 

30 alone. This could allow headings and list bullets to be calculated separately and weighted independently. 

[0474] The particular methods for evaluating contribution from distinguishability, group identity, and headings to ease 
of progression provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the these contributions should be considered within the scope of the present invention, for example, a 
function of measured human responses to differing document characteristics with respect to distinguishability, group 

35 identity, or headings; such that the present invention is directed to not only in the particular method of determining 
these contributions, but also in the much broader concept of using distinguishability, group identity and/or heading 
measures in the context of evaluating document ease of progression, communicability, and document quality. 
[0475] In a preferred embodiment of the present invention, bullets and numbers in lists help to identify the list elements 
and to progress through them. Documents that use bulleted and/or numbered lists should be easier to progress through 

40 that those that do not. A method to calculate a measure for this property is to count the total number of list bullets Nib 
or numbers Nln and divide by the total number of list elements Nle. V !b = (Nib + Nln) / Nle 

[0476] Since there is less chance of confusing two list numbers than confusing two list bullets, one may wish to 
weight the benefits of list numbers higher than bullets. Weighting the counts of bullets and numbers differently when 
they are combined into the numerator of the ratio to total list elements can easily do this. V !b = (alb Nib + aln Nln) / Nle 
45 where alb and aln are the constant weights applied to the count of bullets and count of list numbers. 

[0477] Alternatively, one may wish to calculate separate and independent measures for the fraction of bulleted ele- 
ments and the fraction of numbered elements. 

[0478] The particular methods for evaluating contribution from list bullets and numbers to ease of progression and 
communicability provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 

50 determining the text and image balance contribution should be considered within the scope of the present invention, 
for example, a function of measured human responses to differing list bullet and number specifications with respect to 
ease of progression and communicability; such that the present invention is directed to not only in the particular method 
of determining the text and image balance contribution, but also in the much broader concept of using list bullet and 
number measures in the context of evaluating document ease of progression, communicability and document quality. 

55 [0479] In a preferred embodiment of the present invention, internal references (such as "continued on page 7") serve 
to guide the reader when the intended progression differs from basic convention. Electronic documents can include 
hyperlink forms that conduct the same function of guiding the reader. A simple measure of how helpful the document 
is in guiding the reader is just a count of such hyperlinks and/or references NL. This count should be divided by some 



53 



EP 1 503 336 A2 



measure of the size of the document (such as the number of content objects NO) in order to get a link density. V p!k = 
NL/NO 

[0480] A better measure may be obtained by dividing the count of the references by a count of all the points at which 
the progression does not follow the typical scan order NSO. The conventional western scan order is that the next logical 

5 content element should be aligned with and to the right or below the current object. One can examine the positions of 
the content elements in their logical order and count the instances when this rule is not followed. These are the cases 
where a reference to redirect the reader would be most helpful and one can calculate the ratio of references to breaks 
in scan order. This will typically be a number between 0 and 1 , but is not guaranteed to be confined to values 1 or less. 
To restrict the range, function such as those used above for confining the range can be used, but in this case a simple 

10 clamping the value to 1 should be sufficient. V p!k = MINIMUM(1 , NL/NSO) 

[0481] The particular methods for evaluating contribution from progression links to ease of progression and commu- 
nicability provided herein are exemplary and are not to be considered as limiting in scope. Other methods for deter- 
mining the progression link contribution should be considered within the scope of the present invention, for example, 
a function of measured human responses to the presence of progression link specifications with respect to ease of 

15 progression and communicability; such that the present invention is directed to not only in the particular method of 
determining the progression link contribution, but also in the much broader concept of using progression link measures 
in the context of evaluating document ease of progression, communicability and document quality. 
[0482] In a preferred embodiment of the present invention, it is easier to follow the conventional rules of progression 
(e.g. the next logical element is located directly below the current element) if the elements are aligned. This makes it 

20 clear just which element is below and which is to the right of the current element. A measure of the document alignment 
V a! was described above in the discussion of document aesthetics. 

[0483] The particular methods for evaluating contribution from alignment to ease of progression and communicability 
provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 
alignment contribution should be considered within the scope of the present invention, for example, a function of meas- 
25 ured human responses to differing alignment specifications with respect to ease of progression and communicability; 
such that the present invention is directed to not only in the particular method of determining the alignment contribution, 
but also in the much broader concept of using alignment measures in the context of evaluating document ease of 
progression, communicability and document quality. 

[0484] In a preferred embodiment of the present invention, documents with lots of white space typically are less 
30 crowded. It is easier to distinguish and follow the elements. Thus, a high white space amount can provide a small 
contribution to the overall ease of progression. The non-white space area can be estimated by totaling the areas of 
the content objects (Aj for content object i). The total object area can be scaled by the total document area Ad. V ws = 
(Ad - EAj) / Ad 

[0485] The particular methods for evaluating contribution from white space to ease of progression and communica- 
35 bility provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining 
the white space contribution should be considered within the scope of the present invention, for example, a function 
of measured human responses to differing white space specifications with respect to ease of progression and com- 
municability; such that the present invention is directed to not only in the particular method of determining the white 
space contribution, but also in the much broader concept of using white space measures in the context of evaluating 
40 document ease of progression, communicability and document quality. 

[0486] In a preferred embodiment of the present invention, one of the conventions for progression through western 
documents is the scan positioning of left to right, top to bottom. This is the convention followed by text, but it can also 
be applied to other objects (such as the panes in a comic book). For this convention, one expects the items to have 
about the same height and to be aligned in rows. The left edge of the rows should be vertically aligned. One can 
45 construct a measure that indicates the deviation from this rule. The inverse of this deviation measure then gives the 
adherence to the rule. 

[0487] Step through the document elements in their logical order. For each element find a bounding box that contains 
the object and indicates the position of its top yt, bottom yb, left side xl and right side xr. As one steps through the 
objects, the vertical position of the new object (ytn, ybn) is compared with that of the old object (yto, ybo). Objects 
50 should be placed to the right and below, but not above, so a deviation amount should be added to a deviation accu- 
mulation dcs for the degree to which the new object is above the old. The following expression does this (assuming 
the y coordinates increase as one moves down the page): 

if ytn < yto and ybn < ybo 
55 then dcs = dcs + (yto - ytn) * (ybo - ybn) / (ybo - ytn) 2 

[0488] If the new object is vertically in the same row as the old object, then one expects it to be located to the right 
of the old object. The degree to which it is left of the old object is the amount by which it deviates from the scan order 
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model. One can calculate this deviation with the following expression: 



if ytn <= ybo and xln < xlo and xrn < xro 



then dcs = dcs + (xlo - xln) * (xro - xrn) / (xro - xln) 2 

[0489] These calculations are carried out for each consecutive pair of content elements as one steps through the 
document in logical order. The result is then normalized by dividing by the number of pair comparisons (the number 
of elements minus 1 ) and clamped to 1 . The inverse is then returned. 
Vcs = 1 - MINIMUM(1 , dcs / (NO - 1 )) 

[0490] Figure 79 illustrates an example of the placement for consistency of scan. 

[0491] The particular methods for evaluating contribution from the consistency of scan to ease of progression and 
communicability provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the consistency of scan contribution should be considered within the scope of the present invention, for 
example, a function of measured human responses to differing layouts of ordered content with respect to ease of 
progression and communicability; such that the present invention is directed to not only in the particular method of 
determining the consistency of scan contribution, but also in the much broader concept of using consistency of scan 
measures in the context of evaluating document ease of progression, communicability and document quality. 
[0492] In a preferred embodiment of the present invention, an alternative model for progression order is top to bottom, 
left to right. This is, for example, the order typically used for layout of a story in a newspaper or magazine. One moves 
down a column to the bottom, and then shifts to the top of the next column to the right. One can calculate deviation 
from this ordering in a manner similar from the scan ordering calculation above. In this case, however, one never wants 
to place an object to the left of a previous object, and objects in the same column should not be placed above previous 
items. The corresponding tests are as follows: 

if xln < xlo and xrn < xro 

then dco = dco + (xlo - xln) * (xro - xrn) / (xro - xln) 2 

and 

if xln <= xro and ytn < yto and ybn < ybo 

then dco = dco + (yto - ytn) * (ybo - ybn) / (ybo - ytn) 2 

and 



Vco « 1 - MINIMUM(1, dco / (NO - 1)) 

[0493] Note that an alternative to adding the consistency of scan and consistency of order terms independently to 
the ease of progression expression as shown above is to first combine the two measures and then use the result in 
the ease of progression. The reason for doing this is that the two measures could be combined in such a way, that if 
either of them had a high value, then the combined value would be high. In other words, the document would need to 
follow either one or the other of the layout models, but not necessarily both. 
[0494] Figure 80 illustrates an example of the placement for consistency of order. 

[0495] The particular methods for evaluating contribution from the consistency of order to ease of progression and 
communicability provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the consistency of order contribution should be considered within the scope of the present invention, for 
example, a function of measured human responses to differing layouts of ordered content with respect to ease of 
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progression and communicability; such that the present invention is directed to not only in the particular method of 
determining the consistency of order contribution, but also in the much broader concept of using consistency of order 
measures in the context of evaluating document ease of progression, communicability and document quality. 
[0496] In a preferred embodiment of the present invention, a property similar to ease of progression is ease of nav- 

5 igation. While progression measures the ease or difficulty of moving through the document in the order intended by 
the creator, ease of navigation measures the ability to locate an arbitrary element of the document. In estimating the 
ease of navigation one looks mainly for those features that can aid in finding an element or section. In the example 
method provided here includes headings, list bullets and numbers, running heads and page numbers, internal links, 
and/or group identity. These properties each contribute to the ease of navigation and an overall measure can be created 

10 from a weighted average. V en = w hd V hd + w )b V !b + w rh V rh + w !nk V !nk + w gi V gj where the w's are the weights and the 
V's are the value properties. Note that alternative methods of combination, as well as additional contributing factors, 
are possible. Many of the properties were also used for ease of progression, but the weights used in calculating the 
ease of navigation may be different. 

[0497] A combination of measures, as illustrated in Figure 81, is useful in evaluating the document's ease of navi- 
15 gation. 

[0498] More specifically, the ease of navigation, as illustrated in Figure 81 , is considered a combination of headings, 
list bullets and numbers, running heads and page numbers, internal links, and/or group identity. In Figure 81, the 
quantized ease of navigation value is derived by a combining of the headings, list bullets and numbers, running heads 
and page numbers, internal links, and/or group identity using an ease of navigation quantizer or combiner circuit 55. 

20 [0499] It is noted that the illustration shows a circuit for the ease of navigation quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0500] The particular methods for evaluating document ease of navigation provided herein are exemplary and are 
not to be considered as limiting in scope. Other methods for determining the document ease of navigation should be 

25 considered within the scope of the present invention, for example, a function of measured human responses to differing 
document characteristics with respect to ease of navigation; such that the present invention is directed to not only in 
the particular method of determining the ease of navigation, but also in the much broader concept of using acombination 
of individual measures in the context of evaluating document ease of navigation, communicability and documentquality. 
[0501] In a preferred embodiment of the present invention, page numbers can help greatly in navigating a document. 

30 For running heads, a measure of their value is the number of different heads divided by the number of pages. One can 
find this by examining the document for the heads and making a list of the distinct ones. Then one can count the number 
of heads in the list. For page numbers, one just asks whether or not they are present and if they are, one can add a 
contribution to the measure. V rh = wh Nh / Np + (1 - wh) Bpn where wh is the weight given to running heads, Nh is the 
number of distinct heads, Np is the number of pages in the document, and Bpn is 1 if there are page numbers and 0 

35 otherwise. 

[0502] The particular methods for evaluating contribution from the page numbers to ease of navigation and commu- 
nicability provided herein are exemplary and are not to be considered as limiting in scope. Other methods for deter- 
mining the contribution from page numbers should be considered within the scope of the present invention, for example, 
a function of measured human responses to the presence or absence of page numbers with respect to ease of navi- 
40 gation and communicability; such that the present invention is directed to not only in the particular method of determining 
the page number contribution, but also in the much broader concept of using page number measures in the context 
of evaluating document ease of navigation, communicability and document quality. 

[0503] In a preferred embodiment of the present invention, ease of navigation is strongly related to the locatability 
property for group elements that was described above in the discussion on the ease of use of groups. The measures 

45 of headings, list bullets and numbers and internal links can be captured as described. 

[0504] In the discussion on ease of progression one measured the fraction of progressive links. For ease of navigation 
one wants to count the total number of internal links or references (not just the progressive ones). This will include the 
entries in a table of contents and in an index as well as references or links within the main body of the document. As 
suggested above, one can normalize the count by dividing by the number of content objects: V !nk =MINIMUM(1 , NLT 

50 I NO) where NLT is the total number of internal links and NO is the number of content objects. 

[0505] In trying to find one's way around in a document it is helpful to know when one group of content ends and 
another begins. Thus, there should be a contribution to the ease of navigation from the group identity measure. This 
is another measure that is also used in the ease of progression estimation. A measure of group identity was described 
in the above discussion of ease of use of groups. Group identity is calculated from other measures such as spatial 

55 coherence, the presence of borders or backgrounds, style uniformity, and alignment of elements. 

[0506] The particular methods for evaluating contribution from headings bullets internal links and group identity to 
ease of navigation and communicability provided herein are exemplary and are not to be considered as limiting in 
scope. Other methods for determining the contribution from these properties should be considered within the scope 
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of the present invention, for example, a function of measured human responses to different document characteristics 
with respect to these properties, ease of navigation and communicability; such that the present invention is directed 
to not only in the particular method of determining the contributions, but also in the much broader concept of using 
heading, bullet, internal link and group identity measures in the context of evaluating document ease of navigation, 
5 communicability and document quality. 

COMFORT 

[0507] In a preferred embodiment of the present invention, another property that contributes to the quality of a doc- 
10 ument is the comfort level at which the document is perceived. A method for quantifying the document comfort level 
will be described next. 

[0508] Comfort is calculated as a combination of simpler properties or rules. Violating any of the component rules 
can result in discomfort and ruin the overall comfort of the document layout. Component rules can include limitation of 
font forms, limitation of colors, grouping number, neatness, decipherability, non-intimidating, conventionality, color har- 
15 mony, color appropriateness, consistency of luminance, and/or consistency of size. Each rule is defined to produce a 
value ranging between 0 and 1 such that 0 means low or bad comfort value and 1 means high or good comfort value. 
These (and possibly other such rules) can be calculated and combined to form an overall comfort measure. If Vi is the 
value calculated for the i th rule, then the comfort measure V c is formed as a function E of these contributions: V c = E 
(V, f , v n „ v dc , 

^ni' ^cv ^ch' 

V ca , Vc ... V 

csz/ 

20 [0509] The combining function E can be as simple as a weighted average of the contributions, but because any bad 
contributor can ruin the comfort no matter how good the others are, a linear combination is not preferred. An alternative 
is to use: V c = [E w, (d + V,)-p]- 1/ p - d. The w, factors are the weights that specify the relative importance of each rule; 
they should sum to 1 . The exponent p introduces the nonlinearity that can make one bad value overwhelm many good 
ones. The larger p is, the greater this effect. The constant d is a positive number near 0 and guards against division by 0. 

25 [0510] Other combining functions are possible; for example, one could take the product of the contributions. If weight- 
ing of the contribution is desired, this can be done by exponentiation (using a different set of weight values). V c =nVj wi ' 
Note that the set of rules chosen is illustrative of how a comfort measure can be constructed. Other factors contributing 
to comfort exist and could certainly be included in a more sophisticated quantification of comfort. The particular methods 
for evaluating document comfort provided herein are exemplary and are not to be considered as limiting in scope. 

30 Other methods for determining the document comfort should be considered within the scope of the present invention, 
for example, a function of measured human responses to differing document characteristics with respect to the feeling 
of comfort; such that the present invention is directed to not only in the particular method of determining the comfort 
level, but also in the much broader concept of using a combination of individual measures in the context of evaluating 
document comfort level and document quality. 

35 [0511] A combination of measures, as illustrated in Figure 82, is useful in evaluating the document's comfort. 

[0512] More specifically, the comfort, as illustrated in Figure 82, is considered a combination of limitation of font 
forms, limitation of colors, grouping number, neatness, decipherability, non-intimidating, conventionality, color harmony, 
color appropriateness, consistency of luminance, and/or consistency of size. In Figure 82, the quantized comfort value 
is derived by a combining of the limitation of font forms, limitation of colors, grouping number, neatness, decipherability, 

40 non-intimidating, conventionality, color harmony, color appropriateness, consistency of luminance, and/or consistency 
of size using a comfort quantizer or combiner circuit 60. 

[0513] It is noted that the illustration shows a circuit for the comfort quantization process, this process may also be 
performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, but 
any combination of software and/or hardware that is able to carry out the below described methodologies. 
45 [0514] In a preferred embodiment of the present invention, fonts have many properties that can be selected to achieve 
different effects. Font families can be chosen to give the document different feelings, from formal to playful, light to 
serious, modem to classical. Font size can affect the cost and legibility. Font weights such as bold, can convey impor- 
tance; font styles, such as italic, can indicate that it is special. Font variants such as strikethrough or outlined can add 
further meaning. 

50 [0515] If, however, a single document contains too many different font forms, the result is disquieting. Such "ransom 
note" documents are considered bad style because they lead to discomfort in the reader. The first factor that shall be 
considered as contributing to viewer comfort is the limitation of the number of font forms. Any change in the font spec- 
ification (family, size, weight, style or variant) yields a new form. The document can be examined, and the number of 
distinct font forms Nf can be counted. This can be converted to a number ranging from near 0 (for the case of many 

55 font forms) to 1 (for when there is no more than a single font form) by the expression: V )t = 1 /MAXIMUM(1 ,Nf) 

[0516] However, more sophisticated measures are possible. One can, for example, include as part of the measure 
just how different the fonts are from one another. This can be done by first constructing a list, F, of all the font forms 
that appear in the document. One can then compare every font form in the list to every other font form and accumulate 
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a measure of their differences. For fonts of different sizes, one can make the measure a function of the size difference 
(such as its absolute value). For font weights, one can add to the measure a function of the weight difference. Since 
weights are usually limited to a small set of choices, tables FW[weight(f1), weight(f2)] can be used to describe the 
weight difference function. Contributions due to differences in family style and variant can also be captured in tables, 
5 or a single constant amount af can be added whenever any difference in any of these properties occurs. Comparing 
every font form to every other font form results in differences accumulating on the order of the square of the number 
of fonts. To be more in line with the first simpler measure, one can divide by the number of fonts. The pseudocode to 
calculate this alternate measure would then look as follows: 



for fl from 1 to Nf 
15 for f2 from fl to Nf 

fd = fd + | size(fl) - size(£2) | + Fw[weight(fl), weight(G)] 
if family(fl) differs from family(f2) 

20 

or style(fl ) differs from style(f2) 
or variant(fl) differs from variant(f2) 
25 then fd = fd + af 

end of f2 loop 
end of fl loop 
fd = fd/Nf 
V lt =l/(bf+fd) 

35 [0517] In the last line of the above code, bf is a small positive number that controls how quickly the measure falls off 
with increasing font differences. 

[0518] One further possible extension of the measure may be considered. Since the font differences will have a 
greater impact if the separate font forms are mixed together in the same paragraph than if they are spread over different 
paragraphs, one can count the number of font forms per paragraph and average this over the paragraphs of the doc- 
40 ument. The final accumulated difference measure fd can then be scaled by the average fonts-per-paragraph before 
the inversion to form V !t . 

[0519] The particular methods for evaluating contribution from the limitation of font forms to document comfort level 
provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 
contribution from limitation of font forms should be considered within the scope of the present invention, for example, 
45 a function of measured human responses to the number of font forms with respect to feeling on comfort; such that the 
present invention is directed to not only in the particular method of determining the limitation of font forms contribution, 
but also in the much broader concept of using limitation of font form measures in the context of evaluating document 
comfort level and document quality. 

In a preferred embodiment of the present invention, just as too many fonts are considered to be poor style, so are too 
50 many colors. A document with lots of colors is considered garish. The viewer tries to make sense of the colors and a 
large number makes this a difficult and uncomfortable task. A large number of colors will tire the eye. A simple measure 
of the effect is just a count of the number of different colors found within the document. This can be determined by 
stepping through the document, identifying the colors and saving them in a list (or other data structure such as a tree 
or hash table). As each color is encountered it can be compared to the colors already in the list to determine whether 
55 or not it has been seen before. If it is a new color then it is added to the list. After the document has been processed, 
the number of entries in the list can be counted to give the total number of colors Nc. This can be converted to a number 
ranging from near 0 (for many colors) to 1 (for no more than a single color) by the expression: V !c = 1 / MAXIMUM(1 ,Nc) 
[0520] The above scheme works for constant, uniform colors such as typically used in graphics, but does not address 
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how to handle color sweeps or the huge number of colors seen in pictorial images. For color sweeps one can restrict 
the list entry to only the first and last colors of the sweep. For pictorial images, one can ignore them altogether, or 
extract a few colors from the image by subsampling, or extract a few colors by a cluster analysis of the image values 
in color space. 

5 [0521] The test for whether a color is already in the list does not have to be a strict match. One can compare colors 
by computing the distance between them in color space and comparing the distance to a threshold. If the distance is 
below the threshold, the colors can be considered close enough to match, and a new color list entry is not needed. 
[0522] The comfort can depend on the choice of colors as well as the number of them. One might therefore compare 
the colors of the document pair-wise and accumulate a measure of their compatibility. A simple value to accumulate 

10 would be the distance between the colors in a color space, but a better measure of the affect on comfort would be the 
color dissonance of the pair. Since comparing colors pair-wise accumulates values as the square of the number of 
colors, one can divide the total by the number of colors in the document to get a measure that varies linearly with the 
number of colors. 

[0523] Not every color is equally tiring on the eye and more sophisticated measures can take this into account. 
15 Strongly saturated colors have more of an effect than neutral ones. There are several possible ways to calculate an 
approximate saturation value that can be used in augmenting its discomfort contribution. These were described in the 
above discussion on colorfulness under the eye-catching ability property. 

[0524] For each color in the list, one can add a contribution to a total color discomfort measure. The contribution can 
be a function of the saturation. For example, for the i th color with saturation q, the contribution might be ac + q where 
20 ac is a constant value representing the effect of just having another color, and q is the additional discomfort due to 
that color's saturation, dc = ac Nc + £ q where dc is the color discomfort measure. 

[0525] It is also possible to keep track of the total document area rendered in each color and include a function of 
both the saturation and the area in the augmentation of the discomfort calculation. The idea here is that the effect of 
a large colored area is stronger than the effect of a small one. 
25 [0526] An expression such as: V )c = 1 / (be + dc) where be is a small positive constant, can be used to convert the 
discomfort measure into a limitation of color measure that varies between 0 and 1 . 

[0527] The particular methods for evaluating contribution from the limitation of colors to document comfort level 
provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 
contribution from limitation of colors should be considered within the scope of the present invention, for example, a 
30 function of measured human responses to the number of colors with respect to feeling on comfort; such that the present 
invention is directed to not only in the particular method of determining the limitation of colors contribution, but also in 
the much broader concept of using limitation of color measures in the context of evaluating document comfort level 
and document quality. 

[0528] In a preferred embodiment of the present invention, people are more comfortable with some group sizes than 
35 others. A group should not have too many or too few elements, and odd numbers are preferred over even. The best 
size for a group is 3 elements. A simple expression for the comfort of a group number is: Gc = 1 / (eg + ag (1 - MOD2 
(eg))) where eg is the number of elements in the group, ag is a constant that gives the added discomfort of a even 
number of elements, and MOD2 is a function that give 0 if its argument is even and 1 if it is odd. 
[0529] For an entire document, one needs some method of averaging the grouping number comfort values over all 
40 groups. For example, if there are Ng groups in the document and the comfort value of the i th group is Gq, then the 
simple average over all groups yields: V gn = L Gq / Ng 

[0530] More complex averaging schemes are possible. For example, one could weight the effect of the grouping 
number comfort differently depending on the placement of the group within the hierarchy of the document's logical 
structure tree. 

45 [0531] The particular methods for evaluating contribution from the grouping number to document comfort level pro- 
vided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 
contribution from the grouping number should be considered within the scope of the present invention, for example, a 
function of measured human responses to the number group elements with respect to feeling of comfort; such that the 
present invention is directed to not only in the particular method of determining the grouping number contribution, but 

50 also in the much broader concept of using group size measurements in the context of evaluating document comfort 
level and document quality. 

[0532] In a preferred embodiment of the present invention, people are generally more comfortable with a neat doc- 
ument than with a messy one. One can quantify neatness as a combination of contributing factors. In many cases it 
is easier to identify a factor that makes a document messy and uses the inverse of such factors. An example of a 
55 neatness measure is offered based on the text neatness, border and background presence, alignment, and/or regularity. 
Neatness estimates that employ additional factors are possible. In combining the component neatness measures, 
assume that any source of messiness will destroy the overall neatness (just as was argued for overall comfort). 
[0533] A similar combining formula can be used. V nt = [Z w, (d + V,)-p ] _1/ p - d only now the V, are taken from the set 
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V tn , V bb , V a | and V rg for the text neatness, border/background, alignment and regularity. The weights w i? and parameters 
p and d can be different from those used in calculating comfort. 

[0534] The particular methods for evaluating document neatness provided herein are exemplary and are not to be 
considered as limiting in scope. Other methods for determining the document neatness should be considered within 

5 the scope of the present invention, for example, a function of measured human responses to differing document char- 
acteristics with respect to the feeling of neatness; such that the present invention is directed to not only in the particular 
method of determining the neatness level, but also in the much broader concept of using a combination of individual 
measures in the context of evaluating document neatness, comfort level and document quality. 
[0535] A combination of measures, as illustrated in Figure 83, is useful in evaluating the document's neatness. 

10 [0536] More specifically, the neatness, as illustrated in Figure 83, is considered a combination of text neatness, 
border and background presence, alignment, and/or regularity. In Figure 83, the quantized neatness value is derived 
by a combining of the text neatness, border and background presence, alignment, and/or regularity using a neatness 
quantizer or combiner circuit 60. 

[0537] It is noted that the illustration shows a circuit for the neatness quantization process, this process may also be 
15 performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, but 
any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0538] In a preferred embodiment of the present invention, an example of how factors can contribute to neatness, 
consider the neatness of text. Text neatness can be harmed by the use of some font variants and styles (such as 
underscored text or italics). Quoted text is also considered to be less neat than unquoted text. One can step through 
20 the document examining the text, considering every word, space, and punctuation. For words (and punctuation) de- 
termine a neatness value based on the font used (f). Consider the font family, style and variant when estimating the 
font (un)neatness or messiness. These properties can be considered independently and look-up tables (Tf, Ts, and 
Tv) can be used to store the messiness effect for each. A total messiness measure can collect the effect of the font 
choice, mt = mt + Tf[family(f)] + Ts[style(f)] + Tv[variant(f)] 
25 [0539] For punctuation, look for quotation marks and add an extra contribution for the quotation. In general one can 
add a contribution based on the character code c and a table Tc can store the contribution amounts. This can apply to 
spaces, letters and numbers as well as punctuation, mt = mt + Tc[c] 

[0540] The contributions from font and character can be chosen such that the total messiness contribution for a 
character never exceeds 1 . 

30 [0541] To get an average value for text messiness sum the messiness value for each character (mt, for the i th char- 
acter) and divide by the total number of characters Nch. The text neatness is the inverse of the messiness. V tn = 1 - 
Emtj / Nch 

[0542] Figure 84 illustrates an example of a neater document. Figure 85 illustrates an example of a less neat docu- 
ment. 

35 [0543] The particular methods for evaluating contribution from the text neatness to document neatness and comfort 
level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining 
the contribution from the text neatness should be considered within the scope of the present invention, for example, 
a function of measured human responses to different text styles with respect to feeling of neatness; such that the 
present invention is directed to not only in the particular method of determining the text neatness contribution, but also 

40 in the much broader concept of using text neatness measures in the context of evaluating document neatness level, 
document comfort level and document quality. 

[0544] In a preferred embodiment of the present invention, the use of borders and backgrounds can aid in under- 
standing the document's structure and can add to the document's interest, but it also results in a document that is not 
quite as neat as one without these additions. A document offers several opportunities for borders and/or backgrounds. 

45 They can be found on each page, or for columns, for sections, tables or figures. Step through the document considering 
each opportunity for a border or background. At each such opportunity check to see if a border or a background is 
actually present. If a border is present add the amount vbd to a messiness measure mbb. If a background is present 
add the amount vbk to mbb. Also count the number of opportunities encountered Nb. The neatness contribution from 
borders and backgrounds is the inverse of their average messiness. V bb = 1 - mbb /Nb 

50 [0545] The particular methods for evaluating contribution from the borders and backgrounds to document neatness 
and comfort level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the contribution from the borders and backgrounds should be considered within the scope of the present 
invention, for example, a function of measured human responses to different border and background styles with respect 
to feeling of neatness and comfort; such that the present invention is directed to not only in the particular method of 

55 determining the border and background contribution, but also in the much broader concept of using border and back- 
ground measures in the context of evaluating document neatness level, document comfort level and document quality. 
[0546] In a preferred embodiment of the present invention, an important contributor to neatness is the impression 
that the document components are aligned and regularly positioned. These factors were described above in the dis- 
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cussion on document aesthetics. Using the techniques described measures V a! and V rg for document alignment and 
regularity can be calculated. Note that the weighting factors for their contribution to neatness are likely to be different 
from the factors used in their contribution to aesthetics. 

[0547] The particular methods for evaluating contribution from the alignment and regularity to document neatness 
5 and comfort level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the contribution from the alignment and regularity should be considered within the scope of the present 
invention, for example, a function of measured human responses to different degrees of alignment and regularity with 
respect to feeling of neatness and comfort; such that the present invention is directed to not only in the particular 
method of determining the alignment and regularity contribution, but also in the much broader concept of using align- 
10 ment or regularity measures in the context of evaluating document neatness level, document comfort level and docu- 
ment quality. 

[0548] In a preferred embodiment of the present invention, some text takes more work to decipher and understand 
than others do. Text printed in italics or using an abnormal font variant is harder to read. Light colored text on a light 
background, or dark text on a dark background takes an effort to decipher. This work will tire the reader and make the 
15 document uncomfortable to use. A method for estimating the average decipherability of a document V dc was described 
above in the discussion on how well a document communicates. 

[0549] The particular methods for evaluating contribution from the text decipherability to document comfort level 
provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 
contribution from the text decipherability should be considered within the scope of the present invention, for example, 
20 a function of measured human responses to different text style with respect to decipherability and the feeling comfort; 
such that the present invention is directed to not only in the particular method of determining the text decipherability 
contribution, but also in the much broader concept of using text decipherability measures in the context of evaluating 
document comfort level and document quality. 

[0550] In a preferred embodiment of the present invention, some document constructs can act to intimidate the 
25 reader. By noting the degree to which these factors are present, one can form an intimidation measure. Intimidation 
acts against comfort, so the inverse of the intimidation factor should contribute to the comfort estimation. Factors that 
intimidate include a low amount of white space, high information density, low legibility, bold text, a low picture fraction, 
line use, and/or a high technical level. Many of the factors are familiar from IRS forms. 

[0551] A non-intimidation measure is actually calculated by combining the inverses of the factors that intimidate. To 
30 combine the various contributions to the document's non-intimidation factor, a simple weighted average is used, al- 
though more complex combination schemes are possible. V in = £ Wj Vj where Wj are the weights and the V| are the 
non-intimidation component values V ws , V ih V !g , V dc , V nb , V pf , V n)j V !t corresponding to the above list of factors. 
[0552] A combination of measures, as illustrated in Figure 86, is useful in evaluating the document's intimidation. 
[0553] More specifically, the intimidation, as illustrated in Figure 86, is considered a combination of a low amount of 
35 white space, high information density, low legibility, bold text, a low picture fraction, line use, and/or a high technical 
level. In Figure 86, the quantized intimidation value is derived by a combining of the a low amount of white space, high 
information density, low legibility, bold text, a low picture fraction, line use, and/or a high technical level using an intim- 
idation quantizer or combiner circuit 62. 

[0554] It is noted that the illustration shows a circuit for the intimidation quantization process, this process may also 
40 be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0555] Figure 87 is an example of an intimidating document. 

[0556] The particular methods for evaluating a measure of how intimidating or non-intimidating a document is pro- 
vided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 

45 document intimidation level should be considered within the scope of the present invention, for example, a function of 
measured human responses to differing document characteristics with respect to the feeling of intimidation; such that 
the present invention is directed to not only in the particular method of determining the intimidation level, but also in 
the much broader concept of using a combination of individual measures in the context of evaluating document intim- 
idation level, document comfort level and document quality. 

50 [0557] In a preferred embodiment of the present invention, documents that are "open" with lots of white space are 
not as intimidating as those that are filled with content. A method for estimating the white space fraction was described 
above in the discussion of how well a document communicates. 

[0558] The non white space area can be estimated by totaling the areas of the content objects. The total object area 
can be scaled by the total document area Ad. V ws = (Ad - EAj)/Ad 
55 [0559] The particular methods for evaluating contribution from the white space to document intimidation level and 
comfort level provided herein is exemplary and is not to be considered as limiting in scope. Other methods for deter- 
mining the contribution from the white space should be considered within the scope of the present invention, for ex- 
ample, a function of measured human responses to different white space amounts with respectto feeling of intimidation; 
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such that the present invention is directed to not only in the particular method of determining the white-space contri- 
bution, but also in the much broader concept of using white space measures in the context of evaluating document 
intimidation level, document comfort level and document quality. 

[0560] In a preferred embodiment of the present invention, densely packed information is intimidating and so inverse 
5 of the information density can contribute to the non-intimidation measure. Such an information lightness measure was 
described above in the discussion of a document's eye-catching ability. 

[0561] The particular methods for evaluating contribution from the information lightness or density to document in- 
timidation level and comfort level provided herein are exemplary and are not to be considered as limiting in scope. 
Other methods for determining the contribution from the information density should be considered within the scope of 
10 the present invention, for example, a function of measured human responses to different information and area amounts 
with respect to feeling of intimidation; such that the present invention is directed to not only in the particular method of 
determining the information lightness or density contribution, but also in the much broader concept of using information 
density measures in the context of evaluating document intimidation level, document comfort level and document 
quality. 

15 [0562] In a preferred embodiment of the present invention, an illegible document is intimidating, so legibility should 
contribute to the non-intimidation measure. A method for estimating legibility was described in the above discussion 
of a document's ability to communicate. 

[0563] The particular methods for evaluating contribution from the text legibility to document intimidation level and 
comfort level, provided herein, are exemplary and are not to be considered as limiting in scope. Other methods for 

20 determining the contribution from the legibility should be considered within the scope of the present invention, for 
example, a function of measured human responses to different text characteristics with respect to legibility and the 
feeling of intimidation; such that the present invention is directed to not only in the particular method of determining 
the legibility contribution, but also in the much broader concept of using legibility measures in the context of evaluating 
document intimidation level, document comfort level and document quality. 

25 [0564] In a preferred embodiment of the present invention, the use of bold or heavy weight text is intimidating. Since 
a non-intimidation measure is desired, one would like to have a text lightness measure (high values associated with 
light text weights). A method for determining such a measure is straightforward. Step through the document and ex- 
amine the text to see what fonts are used. One can use a table Tl to look up a lightness value tl for the weight of the 
font f. tl = Tl[weight(f)] 

30 [0565] If tlj is the lightness value for the i th character, then one can find an average lightness (non-boldness) value 
by summing the lightness values and dividing by the total number of characters Nch. V nb = £ tlj / Nch 
[0566] An alternative approach is to collect the area of the bold or heavy text Ab, then divide by the total area of the 
document Ad and invert. V nb =1 - Ab / Ad 

[0567] The particular methods for evaluating contribution from the bold text to document intimidation level and comfort 
35 level, provided herein, are exemplary and are not to be considered as limiting in scope. Other methods for determining 
the contribution from the bold text should be considered within the scope of the present invention, for example, a 
function of measured human responses to different bold text amounts with respect to the feeling of intimidation and 
document comfort level; such that the present invention is directed to not only in the particular method of determining 
the bold text contribution, but also in the much broader concept of using bold text measures in the context of evaluating 
40 document intimidation level, document comfort level and document quality. 

[0568] In a preferred embodiment of the present invention, the presence of vertical lines can be intimidating, espe- 
cially thick ones with high contrast. A method for quantifying the effect of vertical lines is to first step through the 
document and find them. This includes vertical lines that are part of borders and also rectangles with the ratio of width 
to height less than a threshold value. For each line discovered, multiply its area Al by its luminance contrast cl. 
45 [0569] Sum all the weighted areas and divide by the area of the document Ad to get a value between 0 and 1 . Since 
the area devoted to vertical lines is typically small this expression understates the effect, but raising it to a fractional 
power can boost its strength. One then needs to invert the result to get the non-intimidation contribution. V nj = 1 - (E 
cl; Alj / Ad) 1/ P 

[0570] The particular methods for evaluating contribution from the vertical lines to document intimidation level and 
50 comfort level, provided herein, are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the contribution from the lines should be considered within the scope of the present invention, for example, 
a function of measured human responses to different line quantities and styles with respect to the feeling of intimidation 
and document comfort level; such that the present invention is directed to not only in the particular method of deter- 
mining the vertical line contribution, but also in the much broader concept of using line measures in the context of 
55 evaluating document intimidation level, document comfort level and document quality. 

[0571] In a preferred embodiment of the present invention, highly technical material is intimidating. The measure of 
technical level includes such things as reading ease, the presence of numbers, and the absence of pictures. A definition 
of an example technical level measure is given above in the discussion of how well a document communicates. The 
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technical level V t! can be inverted for a measure of non-technical level that can be used in the non-intimidating calcu- 
lation. V nt = 1 - V t , 

[0572] The particular methods for evaluating contribution from the technical level to document intimidation level and 
comfort level, provided herein, are exemplary and are not to be considered as limiting in scope. Other methods for 

5 determining the contribution from the technical level should be considered within the scope of the present invention, 
for example, a function of measured human responses to different document content with respect to technical level, 
the feeling of intimidation and the document comfort level; such that the present invention is directed to not only in the 
particular method of determining the technical level contribution, but also in the much broader concept of using tech- 
nical-level measures in the context of evaluating document intimidation level, document comfort level and document 

10 quality. 

[0573] In a preferred embodiment of the present invention, people have certain expectations about document styles. 
There are conventions that they are accustomed to. Violating such customs may yield some benefits (such as attracting 
attention) and incur costs (such as reduced ease of use). Violating convention almost always creates a little discomfort. 
[0574] Conventionality is defined as the inverse of novelty. A measure of novelty was presented above in the dis- 

15 cussion of how well a document holds interest. 

[0575] The particular methods for evaluating contribution from the document conventionality to document comfort 
level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining 
the contribution from the conventionality should be considered within the scope of the present invention, for example, 
a function of measured human responses to different document styles with respect to conventionality and the feeling 

20 comfort; such that the present invention is directed to not only in the particular method of determining the conventionality 
contribution, but also in the much broader concept of using conventionality measure in the context of evaluating doc- 
ument comfort level and document quality. 

[0576] In a preferred embodiment of the present invention, some combinations of colors fit harmoniously together 
while others clash. Clashing or dissonant colors tire the eye and cause discomfort while harmonious colors can sooth 
25 the viewer. Color harmony is defined as the inverse of color dissonance, V d , which was described above in the dis- 
cussion of a document's eye-catching ability. The color harmony is then: 

v ch = i -v d 

30 

[0577] The particular methods for evaluating contribution from the color harmony to document comfort level provided 
herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the contribution 
from color harmony should be considered within the scope of the present invention, for example, a function of measured 
human responses to different document color combinations with respect to color harmony and the feeling comfort; 
35 such that the present invention is directed to not only in the particular method of determining the color harmony con- 
tribution, but also in the much broader concept of using color harmony measures in the context of evaluating document 
comfort level and document quality. 

[0578] In a preferred embodiment of the present invention, another aspect of what is expected is the appropriateness 
of the color choices. The document design rule is that large background areas should use desaturated colors while 

40 small foreground objects should use saturated colors. One can form a measure of the color inappropriateness by 
multiplying each object's area by its saturation. Actually the area should be measured as a fraction of the total, document 
area Ad in order to restrict the result to the range of 0 to 1 . A large result comes from a large area with a high saturation 
(which is inappropriate). For an average value for the entire document, one must combine the values from all objects, 
and with a simple weighting of saturation by area it would be possible to get a measure of inappropriate color use from 

45 many small saturated foreground objects, when this may actually be appropriate. A better measure is to raise the area 
fraction to a power. This further reduces the influence of small objects. This leads to a color appropriated measure that 
looks as follows: V ca = 1 - E c, (A, / Ad)P 
where p is a value greater than 1 . 

[0579] The particular methods for evaluating contribution from color appropriateness to document comfort level pro- 
50 vided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 
contribution from color appropriateness should be considered within the scope of the present invention, for example, 
a function of measured human responses to different object colors with respect to color appropriateness and the feeling 
comfort; such that the present invention is directed to not only in the particular method of determining the color appro- 
priateness contribution, but also in the much broader concept of using color appropriateness measures in the context 
55 of evaluating document comfort level and document quality. 

[0580] In a preferred embodiment of the present invention, the rule for consistency of luminance states that for a 
group of content elements, the dark elements should come first and the lighter elements should follow. Note, however, 
that the logical structure of a document is typically a tree with each branch node representing a group. Thus the mem- 
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bers of a group are often other groups. The content elements may not be simple objects with a single color and lumi- 
nance. The consistency of luminance rule can still be applied, but the luminance used should be the average luminance 
of the subtree group member. 

[0581] To determine the average luminance of an object, get the luminance of the object Lf, the luminance of the 
background Lb, the area with the foreground color Af and the bounding area of the object Ao. The average luminance 
Lav is then: Lav = (Lf Af + Lb (Ao - Af)) / Ao 

[0582] The average luminance for a group of objects is the sum of the average luminance values for its members 
weighted by their areas plus the contribution from the background. If Ag is the bounding area of the group, LaVj is the 
average luminance for the ith group member and A| is the area of that member then the average luminance for the 
group Lavg is: Lavg = E Lav, A, + Lb ( Ag - E A,)) / Ag 

[0583] To find a measure of the consistency of luminance for a group, step through the members of the group and 
find the average luminance of each member. Compare that luminance to the previous member's luminance and if the 
new luminance is darker than the old then collect the difference. This actually gives a measure of the inconsistency 
and one can use a reciprocal function to convert it to a consistency value ranging between 0 and 1. The method is 
illustrated by the following pseudocode: 



incon = 0 

oldlum = AverageLuminance(groupMember(l)) 



for i = 2 to number of group members 
{ newlum = AverageLuminance(groupMember(i)) 
if newlum < oldlum 

then incon = incon + oldlum - newlum 
oldlum = newlum 
} end of loop 
Vclg = acl / (acl + incon) 

Here Vclg is the consistency of luminance value for the group and acl is a small positive constant value. 
[0584] The above method indicates how to calculate a measure for each node in the content tree, but does not say 
how to obtain a collective value for the tree as a whole. One method for doing this is to form a weighted average of all 
the tree node values, where the weight is a function of the depth of the tree. One can also raise the values being 
combined to a negative power such that a bad consistency value carries the impact of many good values. This can be 
summarized as: V d = ((E Wj (del + Vclj)"P) / E Wj) _1/ P - del where the sums are over all group nodes in the content tree, 
w, is the node depth Vcl, is the consistency of luminance of the node and del is a small positive constant and p is a 
positive value such as 1 . 

[0585] Figure 88 is an example of consistent luminance. Figure 89 is an example of inconsistent luminance. 
[0586] The particular methods for evaluating contribution from the consistency of luminance to document comfort 
level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining 
the contribution from the consistency of luminance should be considered within the scope of the present invention, for 
example, a function of measured human responses to different object luminance values and ordering with respect to 
consistency of luminance and the feeling comfort; such that the present invention is directed to not only in the particular 
method of determining the consistency of luminance contribution, but also in the much broader concept of using con- 
sistency of luminance measures in the context of evaluating document comfort level and document quality. 
[0587] In a preferred embodiment of the present invention, the design rule for consistency of size is that for a group 
of content elements, the large elements should come first and the smaller elements should follow. To find a measure 
of the consistency of size for a group step through the members of the group and find the bounding size of each 
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member. Compare that size to the previous member's size and if the new size is bigger than the old then collect the 
difference. This actually gives a measure of the inconsistency and one can use a reciprocal function to convert it to a 
consistency value ranging between 0 and 1. The method is illustrated by the following pseudocode: 

incon = 0 

oldsize = BoundingSize(groupMember(l)) 
for i = 2 to number of group members 
{ newsize = BoundingSize (groupMember(i)) 
if newsize > oldsize 

then incon = incon + newsize - oldsize 
oldsize = newsize 
} end of loop 
Vcsg = acs / (acs + incon) 



Here Vcsg is the consistency of size value for the group and acs is a small positive constant value. 
25 [0588] In considering the members of the group, one may wish to exclude certain special members (such as headings) 
from the size comparisons. 

[0589] The above method indicates how to calculate a measure for each node in the content tree, but does not say 
how to obtain a collective value for the tree as a whole. One method for doing this is to form a weighted average of all 
the tree node values, where the weight is a function of the depth of the tree. One can also raise the values being 
30 combined to a negative power such that a bad consistency value carries the impact of many good values. This can be 
summarized as: V cs = ( (E Wj ( dcs + VcSj ) _ p) / E Wj )" 1/ p - dcs where the sums is over all group nodes in the content 
tree, Wj is the node depth VcSj is the consistency of size of the node and dcs is a small positive constant and p is a 
positive value such as 1 . 

[0590] Figure 90 is an example of consistent size. Figure 91 is an example of inconsistent size. 

35 [0591] The particular methods for evaluating contribution from the consistency of size to document comfort level 
provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 
contribution from the consistency of size should be considered within the scope of the present invention, for example, 
a function of measured human responses to different object sizes and orderings with respect to consistency of size 
and the feeling comfort; such that the present invention is directed to not only in the particular method of determining 

40 the consistency of size contribution, but also in the much broader concept of using consistency of size measures in 
the context of evaluating document comfort level and document quality. 

CONVENIENCE 

45 [0592] In a preferred embodiment of the present invention, another document property that contributes to its quality 
is the convenience level or ease of use at which the document is perceived. A method for quantifying the document 
convenience level will next be described. As with other properties, convenience is calculated as a combination of 
simpler properties or factors. Violating any of the component factors can result in inconvenience and ruin the overall 
convenience of the document layout. Component factors can include consistency, legibility, disability proof, ease of 

50 navigation, ease of progression, searchability, locatability, viewable fraction, single window display, and/or transmission 
and processing time. 

[0593] Each factor is defined to produce a value ranging between 0 and 1 such that 0 means a low or bad convenience 
value and 1 means a high or good convenience value. These, (and possibly other such rules), can be calculated and 
combined to form an overall convenience measure. If V, is the value calculated for the i th rule, then the convenience 
55 measure V cv is formed as a function E of these contributions: V cv = E(V cns , V )g , V dp , V en , V ep , V sh , V )o , V vf , V sw , ... V tm ) 
[0594] The combining function E can be as simple as a weighted average of the contributions, but because any bad 
contributor can ruin the convenience no matter how good the others are, a linear combination is not preferred. An 
alternative is to use: V cv = [E w s (d + Vi) - P] -1/ P - d. The Wj factors are the weights that specify the relative importance of 
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each rule; they should sum to 1 . The exponent p introduces the nonlinearity that can make one bad value overwhelm 
many good ones. The larger p is the greater this effect. The constant d is a positive number near 1 and guards against 
division by 0. 

[0595] Other combining functions are possible; for example, one could take the product of the contributions. If weight- 
ing of the contribution is desired, this can be done by exponentiation (where the weights would be different from those 
used above). V cv = nVj™' 

[0596] A combination of measures, as illustrated in Figure 92, is useful in evaluating the document's convenience. 
[0597] More specifically, the convenience, as illustrated in Figure 92, is considered a combination of consistency, 
legibility, disability proof, ease of navigation, ease of progression, searchability, locatability, viewable fraction, single 
window display, and/or transmission and processing time. In Figure 92, the quantized convenience value is derived 
by a combining of the consistency, legibility, disability proof, ease of navigation, ease of progression, searchability, 
locatability, viewable fraction, single window display, and/or transmission and processing time using a convenience 
quantizer or combiner circuit 70. 

[0598] It is noted that the illustration shows a circuit for the convenience quantization process, this process may also 
be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0599] Note that the set of rules chosen is illustrative of how a convenience measure can be constructed. Other 
factors contributing to ease of use exist and could certainly be included in a more sophisticated quantification of con- 
venience. The particular methods for evaluating document convenience provided herein are exemplary and are not to 
be considered as limiting in scope. 

[0600] Other methods for determining the document convenience should be considered within the scope of the 
present invention, for example, a function of measured human responses to differing document characteristics with 
respect to the feeling of convenience; such that the present invention is directed to not only in the particular method 
of determining the convenience level, but also in the much broader concept of using a combination of individual meas- 
ures in the context of evaluating document convenience level and document quality. 

[0601] In a preferred embodiment of the present invention, in graphic design there are many consistency rules. 
Consistency helps people build an internal model of the document that, in turn, makes it easier to use. Some of the 
contributing rules or factors to consistency and how factors can be combined into an overall consistency measure will 
now be described. The example consistency measure will include position order, luminance, size, and/or style. The 
methods for calculating measures for these factors have been described above and will not be repeated in detail here. 
[0602] In combining the component consistency measures assume that any source of inconsistency will destroy the 
overall consistency. A combining formula that can be used is as follows. V nt = [E Wj (d + Vj) _ P] -1/ P - d where the Vj are 
taken from the set V cp , V d , V csz and V cst . The weights w,, indicate the relative importance of the different measures. 
The parameter p is a number 1 or larger and d is a value slightly larger than 0. 

[0603] A combination of measures, as illustrated in Figure 94, is useful in evaluating the document's consistency. 
[0604] More specifically, the consistency, as illustrated in Figure 94, is considered a combination of position order, 
luminance, size, and/or style. In Figure 94, the quantized consistency value is derived by a combining of the position 
order, luminance, size, and/or style using a consistency quantizer or combiner circuit 72. 

[0605] It is noted that the illustration shows a circuit for the consistency quantization process, this process may also 
be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific circuits, 
but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0606] The particular methods for evaluating a measure of the consistency of a document is provided herein are 
exemplary and are not to be considered as limiting in scope. Other methods for determining the document consistency 
level should be considered within the scope of the present invention, for example, a function of measured human 
responses to differing document characteristics with respect to the feeling of consistency; such that the present inven- 
tion is directed to not only in the particular method of determining the consistency level, but also in the much broader 
concept of using a combination of individual measures in the context of evaluating document consistency level, doc- 
ument convenience level and document quality. 

[0607] In a preferred embodiment of the present invention, for position order there are actually two measures, con- 
sistency of scan and/or consistency of order, both of which are described above in the discussion on quantifying how 
well a document communicates. The layout placement of content objects should follow one of these two rules to achieve 
a consistent model between logical order and layout position. However, the layout need not follow both models simul- 
taneously. One should therefore combine the consistency of scan V cs and the consistency of order V co into an overall 
consistency of position V cp . A simple way to do this is: V cp = MAXIMUM(V CS , V co ) 

[0608] A more sophisticated alternative is the following: V cp = dcp - (((dcp - V cs ) - p + (dcp - V co ) _ P) / 2) _1/ p where dcp 
is a constant slightly larger than 1 and p is also a number 1 or greater. 

[0609] A combination of measures, as illustrated in Figure 93, is useful in evaluating the document's consistency of 
position. 
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[0610] More specifically, the consistency of position, as illustrated in Figure 93, is considered a combination of con- 
sistency of scan and/or consistency of order. In Figure 94, the quantized consistency of position value is derived by a 
combining of the consistency of scan and/or consistency of order using a consistency of position quantizer or combiner 
circuit 71. 

[061 1 ] It is noted that the illustration shows a circuit for the consistency of position quantization process, this process 
may also be performed in software by the microprocessor and/or firmware. The quantization is not limited to specific 
circuits, but any combination of software and/or hardware that is able to carry out the below described methodologies. 
[0612] The particular methods for evaluating contribution from the consistency of position to document consistency 
level and convenience level provided herein are exemplary and are not to be considered as limiting in scope. Other 
methods for determining the contribution from consistency of position should be considered within the scope of the 
present invention, for example, a function of measured human responses to different positioning of content objects 
with respect to the feeling of consistency and document convenience level; such that the present invention is directed 
to not only in the particular method of determining the consistency of position contribution, but also in the much broader 
concept of using consistency of position measures in the context of evaluating document consistency level, document 
convenience level and document quality. 

[0613] In a preferred embodiment of the present invention, a method for computing a measure of the consistency of 
luminance V d is described in the above discussion of document comfort. The idea is that darker items should precede 
lighter ones in a group. 

[061 4] The particular methods for evaluating contribution from the consistency of luminance to document consistency 
level and convenience level provided herein are exemplary and are not to be considered as limiting in scope. Other 
methods for determining the contribution from consistency of luminance should be considered within the scope of the 
present invention, for example, a function of measured human responses to different luminance settings and orderings 
of content objects with respect to the feeling of consistency and document convenience level; such that the present 
invention is directed to not only in the particular method of determining the consistency of luminance contribution, but 
also in the much broader concept of using consistency of luminance measures in the context of evaluating document 
consistency level, document convenience level and document quality. 

[0615] In a preferred embodiment of the present invention, a method for computing a measure of the consistency of 
size V csz is also presented in the above discussion on document comfort. The idea is that larger items should precede 
smaller ones in a group. 

[061 6] The particular methods for evaluating contribution from the consistency of size to document consistency level 
and convenience level provided herein are exemplary and are not to be considered as limiting in scope. Other methods 
for determining the contribution from consistency of size should be considered within the scope of the present invention, 
for example, a function of measured human responses to different sizes and orderings of content objects with respect 
to the feeling of consistency and document convenience level; such that the present invention is directed to not only 
in the particular method of determining the consistency of size contribution, but also in the much broader concept of 
using consistency of size measures in the context of evaluating document consistency level, document convenience 
level and document quality. 

[0617] In a preferred embodiment of the present invention, a method for computing a measure of the consistency of 
style V cst is presented above in the discussion of ease of use of groups. The idea is that items at similar positions in 
the content structure should have matching styles. 

[061 8] The particular methods for evaluating contribution from the consistency of style to document consistency level 
and convenience level provided herein are exemplary and are not to be considered as limiting in scope. Other methods 
for determining the contribution from consistency of style should be considered within the scope of the present invention, 
for example, a function of measured human responses to different styles and orderings of content objects with respect 
to the feeling of consistency and document convenience level; such that the present invention is directed to not only 
in the particular method of determining the consistency of style contribution, but also in the much broader concept of 
using consistency of style measures in the context of evaluating document consistency level, document convenience 
level and document quality. 

[0619] In a preferred embodiment of the present invention, a document that is difficult to read is often difficult to use. 
A measure of legibility V !g was defined above as a contributor to a document's communicability. It can contribute to 
convenience as well as communicability but with a different weight. In fact, one could argue that communicability, as 
a whole, should be used as a contributor to convenience. While this is not ruled out, the example here will just include 
a few of the components of communicability that have particular bearing on convenience. Considering them separately 
allows one to give them different weights when contributing to convenience than those used for the contribution to 
communicability. 

[0620] The particular methods for evaluating contribution from legibility to document convenience level provided 
herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the contribution 
from legibility should be considered within the scope of the present invention, for example, a function of measured 
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human responses to different text characteristics with respect to legibility and the feeling convenience; such that the 
present invention is directed to not only in the particular method of determining the legibility contribution, but also in 
the much broader concept of using legibility measures in the context of evaluating document convenience level and 
document quality. 

5 [0621] In a preferred embodiment of the present invention, in general, disability proof refers to how well the document 
can serve people with handicaps. For example, a document of only text can be read to someone who is blind, but a 
document with images would be much harder to convey. Another example of a contributor to a disability proof measure 
is the red-green friendliness property that was defined in the above discussion on how well a document communicates. 
The idea behind the measure is that there should be either luminance contrast or blue-yellow contrast between fore- 
go ground and background colors in order to be red-green friendly. Without this contrast it would be difficult for a colorblind 
person to distinguish foreground objectfrom background. This measure will be used as an example of a simple disability 
proof function, V dp . Additional functions for other handicaps are certainly possible and could be combined into a more 
sophisticated measure. 

[0622] The particular methods for evaluating contribution from disability compensation characteristics to document 
15 convenience level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the contribution from disability compensation should be considered within the scope of the present inven- 
tion, for example, a function of measured human responses to different document characteristics with respect to dis- 
ability compensation and the feeling convenience; such that the present invention is directed to not only in the particular 
method of determining the disability compensation contribution, but also in the much broader concept of using disability 
20 compensation measures in the context of evaluating document convenience level and document quality. 

[0623] In a preferred embodiment of the present invention, methods for estimating the ease of navigation V en and 
ease of progression V ep were also described above in the discussion of how well a document communicates. They 
contribute to convenience as well as communicability, and, in fact, are more important (and have larger weights) as 
convenience measures than as communicability measures. The idea behind the calculation of these properties is to 
25 estimate and combine contributing features such as distinguishability, group identity, spatial coherence, list bullets, 
headings, internal links, alignment and others. 

[0624] The particular methods for evaluating contribution from ease of navigation or ease of progression to document 
convenience level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the contribution from ease of navigation or ease of progression should be considered within the scope of 

30 the present invention, for example, a function of measured human responses to different document characteristics with 
respect to ease of navigation or ease of progression and the feeling convenience; such that the present invention is 
directed to not only in the particular method of determining the ease of navigation or ease of progression contribution, 
but also in the much broader concept of using ease of navigation or ease of progression measures in the context of 
evaluating document convenience level and document quality. 

35 [0625] In a preferred embodiment of the present invention, two other related concepts are the searchability V sh and 
the locatability V !o . Locatability is a measure of how easy it is to find a document object (whereas ease of navigation 
is how easy it is to find a document location). Searchability is a rougher measure that looks for the presence of document 
features that aid in locating document objects. These measures have been described above in the discussion of meas- 
ures for the ease of use of content groups. 

40 [0626] The particular methods for evaluating contribution from searchability or locatability to document convenience 
level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining 
the contribution from searchability or locatability should be considered within the scope of the present invention, for 
example, a function of measured human responses to different document characteristics with respect to searchability 
or locatability and the feeling convenience; such that the present invention is directed to not only in the particular 

45 method of determining the searchability or locatability contribution, but also in the much broader concept of using 
searchability or locatability measures in the context of evaluating document convenience level and document quality. 
[0627] In a preferred embodiment of the present invention, when a document is broken into pages, some content 
groups may get spread over two or more pages. If the document is displayed on a workstation, some entire content 
groups may not fit completely into the display window. This inability to view the logical group as a unit can be a hindrance 

50 and should reduce the document's convenience measure. 

[0628] To estimate the viewable fraction for a group displayed on a workstation, first find the bounding size (width 
and height of the group (wg, hg). Next find the size of the typical display window (wp, hp). The viewable width and 
height is the minimum of the group and window dimensions. 

55 wv = MINIM UM(wg, wp) 

hv = MINIMUM(hg, hp) 

The measure of unity of display for the group is then given by ratio of the visible area to group area: U = (wv hv) / (wg hg) 
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[0629] For the case where the group has been split over pages, one can construct a measure by first finding the 
area of the group elements on each page (e.g. Agp for page p). Next find the maximum area among the pieces and 
divide it by the total group area. U = MAXP(Ag p ) / £ Ag p 

[0630] While this provides a measure for any particular group within a document, one still has to somehow combine 
5 these group measures to achieve an overall measure of the document's viewable fraction. Recognize that the level of 
the group within the documents logical tree structure should make a difference. One would be much less likely to expect 
or need high-level groups to be seen as a unit than the low level groups near the bottom of the tree. First sort the 
groups by their tree level and find a simple average value for each level (i.e. Uav L). Then combine the average values 
for the levels weighted by a function of the level: V vf = E w(L) Uav L / E w(L) 
w [0631] The weighting function w(L) should increase with increasing level such as w(L) = a L for a constant a. 

[0632] The particular methods for evaluating contribution from viewable fraction to document convenience level pro- 
vided herein are exemplary and are not to be considered as limiting in scope. Other methods for determining the 
contribution from viewable fraction should be considered within the scope of the present invention, for example, a 
function of measured human responses to different viewable amounts of the document with respect to the feeling 
15 convenience; such that the present invention is directed to not only in the particular method of determining the viewable 
fraction contribution, but also in the much broader concept of using viewable fraction measures in the context of eval- 
uating document convenience level and document quality. 

[0633] In a preferred embodiment of the present invention, while the viewable fraction measure gives some indication 
of whether document components can be seen in their entirety, there is a special advantage in being able to see the 
20 entire document in a single window or page. A simple calculation can be used to create this measure. It is the same 
as for viewable fraction, only it uses the area of the entire document. If the width and height of the document are wd, 
hd and the width and height of the display or page are wp hp, then calculate: 

wv = MINIMUM(wd, wp) 
25 hv = MINIMUM(hd, hp) 

And set the single window display measure to: V swd = (wv hv) / (wd hd) 

[0634] Figure 95 illustrates the generation of an electronic window 150 associated with a page 100 of a document. 
The electronic window 150 includes navigation buttons to navigation over the page or through the document. This 
30 electronic window 150 can be used to define the areas of the document to be analyzed by the present invention as 
well as allow the user to define what classes and sub-parameters which are to be measured and quantized by the 
present invention. 

[0635] The particular methods for evaluating contribution from single-window display of the document-to-document 
convenience level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 

35 determining the contribution from a single window display should be considered within the scope of the present inven- 
tion, for example, a function of measured human responses to documents that can or cannot be displayed in a single 
window or page with respect to the feeling convenience; such that the present invention is directed to not only in the 
particular method of determining the single window display contribution, but also in the much broader concept of using 
single-window display measures in the context of evaluating document convenience level and document quality. 

40 [0636] In a preferred embodiment of the present invention, one of the more annoying and inconvenient occurrences 
when obtaining or processing a document is having to wait while the machine works on downloading or displaying it. 
The transmission time is a product of the size of the document file and the bandwidth of the communications channel. 
While processing time can also depend upon the types of objects that the document contains and on the type of 
processing being done, a rough estimate can be formed as the product of the file size and a processing speed factor. 

45 One can therefore use the file size as a rough indicator of these time costs. To convert file size S into a value between 
0 and 1 one can use the expression: V tm = at/(at + S) where at is a constant that is about the typical document file size. 
[0637] The particular methods for evaluating contribution from transmission time or processing time to document 
convenience level provided herein are exemplary and are not to be considered as limiting in scope. Other methods for 
determining the contribution from transmission time or processing time should be considered within the scope of the 

50 present invention; such that the present invention is directed to not only in the particular method of determining the 
transmission time or processing time contribution, but also in the much broader concept of using time measures in the 
context of evaluating document convenience level and document quality. 

ECONOMY 

55 

[0638] In a preferred embodiment of the present invention, one other dimension by which the quality of a document 
may be judged is by the costs that it incurs. Costs arise in several ways. For printed documents, there is the cost of 
the materials required (the paper and the ink). There is also a cost in the effort required to print the document (labor 
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and press time). Material cost may not apply to documents viewed on electronic displays, but there is the cost to 
transmit and store the document. There is also the cost in the time the viewer spends waiting while the document is 
transmitted, or while it is being processed for display. Many of these costs depend upon the size of the document (such 
as described above for transmission and processing time). However, other properties can also have an effect. For 
example, the size of the fonts can affect the amount of paper needed for printing, and the presence of color can affect 
the cost of the ink. 

[0639] The above described quality quantization process can be utilized in many systems. In a preferred embodiment, 
a system for dynamic document layout in accordance with embodiments of the present invention, a document layout 
processing system and printers, although the system can comprise other numbers and types of systems, devices, and 
components in other configurations. The present invention provides a system and method for dynamic document layout 
that is able to learn new intelligent mutators during operations and is able to determine the most appropriate sequence 
of mutators given a document's current characteristics. 

[0640] In accordance with a preferred embodiment, the document layout processing system is coupled to the printers, 
although the document layout processing system could be coupled to other types and numbers of devices in other 
configurations. A variety of communication systems and/or methods can be used to operatively couple and communi- 
cate between the document layout processing system and the printers, including a direct connection, a local area 
network, a wide area network, the world wide web, modems and phone lines, or wireless communication technology 
each having communications protocols. In these embodiments, the printers are coupled to the document layout 
processing system by a hard-wire connection over a local area network, although other types of connections, devices, 
and networks, such as a wireless communication system, could be used 

[0641 ] The document layout processing system includes a processor, a memory storage device, a user input device, 
a display device, and an input/output interface device which are coupled together by a bus or other link, although other 
types of document layout processing systems comprising other numbers and types of components in other configura- 
tions can be used. The processor executes a program of stored instructions for one or more aspects of the present 
invention as described herein. 

[0642] The memory storage device stores the programmed instructions for one or more aspects of the present in- 
vention as described herein for execution by the processor, although some or all of the programmed instructions could 
be stored and/or executed elsewhere, such as in printer(s). A variety of different types of memory storage devices, 
such as a random access memory (RAM) or a read only memory (ROM) in the system or a floppy disk, hard disk, CD 
ROM, or other computer readable medium which is read from and/or written to by a magnetic, optical, or other reading 
and/or writing system that is coupled to the processor, could be used for memory storage device to store the pro- 
grammed instructions described herein, as well as other information. 

[0643] The user input device enables an operator to generate and transmit signals or commands to the processor, 
such as a request to print or display a document on printer(s). A variety of different types of user input devices could 
be used for user input device, such as a keyboard or computer mouse. The display device displays information for the 
operator of the document layout processing system, such as an image of the document layout or the status of the print 
job at a first printer. A variety of different types of display devices could be used for display device, such as a display 
monitor. The input/output interface system is used to operatively couple and communicate between the document 
layout processing system and the printers. 

[0644] The first printer is coupled to the document layout processing system, although other types of devices can 
be coupled to the document layout processing system. The first printer prints documents received from the document 
processing system. The first printer has a particular set of characteristics when printing a document which affects the 
resulting printed image of the document, such as margins or a particular paper size on which the document is printed. 
Since the components of a printer, including its connections and operation, are well known, they will not be described 
in detail here. 

[0645] A second printer is also coupled to the document layout processing system, although other types of devices 
can be coupled to the document processing system. The second printer also prints documents received from the 
document processing system. The second printer also has a particular set of characteristics when printing a document 
which effect the resulting printed image of the document which are different from the characteristics of the first printer, 
although both printers could have the same characteristics when printing a document. Like the first printer, the com- 
ponents of the second printer, including their connections and operation, are well known, they will not be described in 
detail here. 

[0646] The document processing system selects a portion of an original document, although other portions or the 
entire original document could be selected for determining a layout. The portion of the document selected is the portion 
that needs re-layout or adjustment. The original document can be obtained in a variety of different manners, such as 
retrieved from the web, from an e-mail attachment, from another computer system, or from a document created by the 
operator. 

[0647] Next, the document processing system compares one or more elements of the selected portion of the original 
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document against the same types of elements in portions of a plurality of other stored documents obtained from memory 
storage device, although other types of comparisons of other numbers and types of elements and other portions could 
be used. A variety of different types of elements could be used by the document processing system in this comparison, 
such as font size, font type, number of lines of text, line spacing, number of alphanumeric characters, size of an outer 
perimeter of the arrangement of alphanumeric characters, and number of images. The document processing system 
can assign a score to each comparison, such as one score for a complete match, another score for a partial match, 
and no score when there is no match, although other manners for assigning a score can be used. 
[0648] The document processing system identifies which stored document with the portion which is closest to the 
portion of the original document based on the comparison of the selected elements. In these embodiments, the doc- 
ument processing system generates a score based on the comparison of the elements of the selected portion of the 
original document against the same types of elements in portions of a plurality of other stored documents. The document 
processing system identifies the stored document with the portion which is closest to the selected portion of the original 
document based on the highest generated score, although the document processing system could use other ways to 
identify the stored document with the closest portion. 

[0649] The document processing system obtains the one or more mutators used in the identified, stored document 
from memory storage device for possible use in the selected portion of the original document. A variety of different 
types of mutators could be obtained, such as mutators for adjusting a font of type, adjusting line spacing, adjusting at 
least one color, adjusting a location of at least one section in the portion of the original document, increasing font size 
to increase legibility, and making the line lengths shorter to increase legibility, etc. It is noted that other types of mutators 
alone or in different combinations could be obtained and used. 

[0650] The document processing system identifies the device, such as printer(s), on which the original document is 
to be displayed. The document processing system identifies the device based on instructions received from an operator 
using user input device requesting a particular device to display the original document, although other ways of identi- 
fying the display device can be used, such as a programmed selection in the memory storage device of document 
processing system to use a particular printer for a print job. 

[0651] As part of the identification process, the document processing system obtains information from memory stor- 
age device about the characteristics of the device, although other ways of obtaining information about the character- 
istics of the device can be used, such as an inquiry by the document processing system to the device, such as printer, 
for the information. 

[0652] The document processing system determines which of the one or more mutators obtained from the identified, 
stored document to use on the selected portion of the original document. The document processing system determines 
which of the mutators to use based on the characteristics of the device on which the original document is going to be 
displayed and based on one or more elements of the original document, although other manners for determining which 
of the mutators to select can be used. 

[0653] For example, if the first printer selected for the printing job is a black-and-white printer, a mutator for altering 
color obtained from the identified, stored document is irrelevant and would not used by the document processing 
system. 

[0654] In another example, the document processing system could have lists of mutators stored in memory which 
are associated with particular types of documents, such as for text documents, documents with text and images, and 
documents with images, and then the document processing system would determine to use the obtained mutators that 
were on appropriate stored list for the type of document that matches the portion of the original document or the original 
document. 

[0655] The document processing system also determines using one or more algorithms for document layout stored 
in memory storage device and one or more other style sheets stored in memory storage device one or more other 
mutators to apply to the selected portion of the original document, although other manners for determining which, if 
any, other the mutators to use can be implemented. 

[0656] The following is a description of a preferred embodiment of the algorithms and methods used for determining 
mutators and other parameters for document layout, which are stored as programmed instructions for execution by 
document processing system. 

[0657] In determining mutators and other parameters for document layout, the document is modeled as a constraint 
optimization problem which combines both required constraints with non-required design constraints that act as opti- 
mization criteria. One of a set of many existing constraint optimization algorithms is then used to solve the problem, 
resulting in an automatically generated document that is well designed because it has optimized some specified design 
criteria. 

[0658] In particular, a document template is represented as a constraint optimization problem, and therefore contains 
a set of variables, a value domain for each variable, a set of required constraints, and a set of desired constraints (i. 
e. optimization functions). 

[0659] The areas of the document to be filled with content are modeled as problem variables, as are any parameters 
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of the document that can be changed. 

[0660] As an example, a template specifies that there are two areas that should be filled with content: areaA and 
areaB. The template also specifies that the positions and sizes of areaA and areaB can be changed. Thus, the problem 
variables for this example are: areaA, areaB, areaA-topLeftX, areaA-topLeftY, areaB-topLeftX, areaB-topLeftY, areaA- 

5 width, areaA-height, areaB-width, and areaB-height. 

[0661] The constraint optimization formulation further specifies that each problem variable has a value domain con- 
sisting of the possible values to assign to that variable. For variables that are document areas to be filled with content 
(e.g., areaA and areaB), the value domains are the content pieces that are applicable to each area. For variables that 
are document parameters, the value domains are discretized ranges for those parameters, so that each potential value 

10 for the parameter appears in the value domain e.g., 1 ..MAXINT]. For variables whose value domains are content 
pieces, the default domain is set up to be all possible content pieces in the associated content database, which is 
specified in the document template. 

[0662] The required constraints specify relationships between variables and/or values that must hold in order for the 
resulting document to be valid. The desired constraints specify relationships between variables and/or values that we 
15 would like to satisfy, but aren't required in order for the resulting document to be valid. Constraints may be unary (apply 
to one value/variable), binary (apply to two values/variables), or n-ary (apply to n values/variables), and in our invention 
are entered by the user as part of the document template. 

[0663] An example of a required unary constraint in the document domain is: areaA must contain an image of a 
castle. An example of a required binary constraint is: areaA-topLeftY + areaA-height < areaB-topLeftY. If we had another 
20 variable (areaC), an example of a required 3-ary constraint is: areaA-width + areaB-width > areaC-width. In a variable 
data application of this invention (one of many possible applications), the constraints would also refer to customer 
attributes (e.g., areaA must contain an image that is appropriate for customeM .age). 

[0664] Desired constraints are represented as objective functions to maximize or minimize. For example, a desired 
binary constraint might be the objective function: f = areaA-width * areaA-height, to be maximized. If more than one 
25 objective function is defined for the problem, the problem becomes a multi-criteria optimization problem. If it is a multi- 
criteria optimization problem, we sum the individual objective function scores to produce the overall optimization score 
for a particular solution. We can furthermore weight each of the desired constraints with a priority, so that the overall 
optimization score then becomes a weighted sum of the individual objective function scores. 

[0665] Any one of the known existing constraint optimization algorithms is then applied to create the final output 
30 document. This invention further describes a means to use a genetic algorithm (one of the many possible constraint 
optimization algorithms) for doing the constraint optimization and thereby automatically creating a final output document 
that adheres not only to the required constraints, but also to a set of desired constraints. 

[0666] In the genetic algorithm formulation of constraint optimization for document creation, the genome is built such 
that each gene in the genome is a variable of the constraint problem. The unary constraints are used to set up the 
35 allowable value domains for each gene. These can be some default range, or input by the user. 

[0667] The fitness function is defined such that it returns a fitness of 0 for any population members that do not meet 
the required constraints, and for the members that do meet the required constraints, it returns a fitness score that is a 
sum of the scores of the individual desired constraints. For instance, if we have the required constraints: 

40 C1 : areaA-width < 300 

C2: areaB-width < 300 

And the desired constraints: 

45 C3: areaA -width = areaB-width, to be maximized (ranges from 0 to 1) 

C4: areaA-height = areaB-height, to be maximized (ranges from 0 to 1) 

Examples of fitness function for these desired constraints are 

50 [0668] 

f 3 = 1 - |areaA-width - areaB-width | / (areaA-width + areaB-width) 

55 

f4 = 1 - |areaA-height - areaB-height| / (areaA-width + areaB-height) 
[0669] If we have a population member with areaA-width = 350, areaA-height = 350, areaB-width = 400, areaB- 
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height = 200, the fitness function returns a score of 0. If, however, we have a population member with areaA-width = 
300, areaA-height = 200, areaB-width = 300, areaB-height = 200, the fitness function returns a score of 2. If we have 
a population member with areaA-width = 225, areaA-height = 200, areaB-width = 300, areaB-height = 200, the fitness 
function returns a score of 1 .875. 

5 [0670] Our formulation also extends to allow weighting of the various desired constraints. Thus, the document creator 
can specify that certain desired constraints are more important than others. For instance, we could have constraint C3 
weighted with an importance of 1.5, and C4 weighted with an importance of .5, meaning that the two objects having 
the same width is more important than the two objects having the same height. The fitness function's overall score is 
then computed as a weighted sum of the individual desired constraints. 

10 [0671] For instance, if we have a population member with areaA-width = 225, areaA-height = 200, areaB-width = 
300, areaB-height = 200, desired constraint C3 returns .875, which is multiplied by C3's weight of 1.5, to get 1.286. 
Desired constraint C4 returns 1 , which is multiplied by C4's weight of .5, to get 0.5. The overall fitness score is then 
1.125 + 0.5 = 1.786. 

[0672] If, on the other hand, we have a population member with areaA-width = 300, areaA-height = 200, areaB-width 
15 = 300, areaB-height = 1 50, desired constraint C3 returns 1 , which is multiplied by C3's weight of 1 .5 to get 1 .5. Desired 
constraint C4 returns .875, which is multiplied by C4's weight of .5, to get .438. The overall fitness score is then 1 .5 + . 
438 = 1 .938, thereby preferring the solution that violates C3 the least. 

[0673] In the genetic algorithm implementation, an initial population of chromosomes is created by selecting values 

for each gene, and doing this for the desired number of population members. We evaluate each member of this pop- 
20 ulation according to the fitness function, resulting in a score for each population member. We then select the most fit 

individuals (i.e., best fitness score) as parents for the new population, and create a new population from the parents 

using crossover/mutation operations. We iterate through populations until we reach a specified stopping condition (e. 

g., a certain number of iterations are complete, or until we have crossed a minimum threshold for the fitness function). 

[0674] Thus, each genome is evaluated according to how well it satisfies or achieves the design qualities along with 
25 the other required constraints. This evaluation results in a generated document that not only satisfies the required 

constraints, but that is also optimized for the specified design qualities. 

[0675] The document processing system determines an order or sequence for applying the one or more obtained 
mutators and the one or more determined mutators to the selected portion of the original document. In these particular 
embodiments, the document processing system determines the order based on the order the obtained mutators were 
30 used in the identified, stored document, although other manners for determining the order for applying the mutators 
could be used. 

[0676] For example, the ordering may be a learned function based on noting the effectiveness of orderings on the 
document quality measure. In another example, the selected order for applying mutators could be based on a prede- 
termined priority order for applying mutators which is stored in memory. The document processing system would de- 
35 termine where each of the obtained mutators occurred in the stored priority order and then would base the order of 
applying the mutators based on this determination. 

[0677] The document processing system applies the selected one or more obtained mutators and the one or more 
determined mutators in the determined ordered order to the selected portion of the original document. 
[0678] The document processing system stores the selected portion of the original document with the applied mu- 
40 tators as one of the stored documents in memory storage device. The newly stored portion of the original document 
can now be used to assist with determining the layout of other portions of the original document or of other documents 
to be displayed. 

[0679] The document processing system determines if another portion of the original document should be selected 
for determining a dynamic document layout. If one or more additional portions in the original document are desired to 

45 be selected, for example if other portions of the original document have not already been selected, the process for 
determining a dynamic document layout begins again for the newly selected portion of the original document in the 
same manner as described above. If no more portions in the original document are desired to be selected, for example 
if the entire original document was selected for processing or all of the portions of the original document have already 
been selected, the process for determining a dynamic document layout ends. 

50 [0680] In this preferred embodiment, although a case-based approach is provided to apply mutators to a document 
to obtain a desirable document layout, the concepts of the present invention can also continuously store the determined 
layouts for use in determining the layout of future documents. By combining case-based mutators with genetic algo- 
rithms for dynamic document layout, a more efficient and reliable automated scheme for dynamic document layout is 
realized. 

55 
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Claims 

1. A method for quantifying a measure of quality of a document, comprising: 
5 (a) measuring a predetermined set of characteristics of the document; 

(b) quantizing the measured predetermined set of characteristics of the document; and 

(c) generating a quantized convenience value for the document based on a predetermined combining function, 
10 the predetermined combining function combining the quantized measured predetermined set of characteris- 
tics, the quantized convenience value being a measure of quality of the document. 

2. The method as claimed in claim 1 , wherein the predetermined combining function is a weighted average or sum 
of the quantized measured predetermined set of characteristics. 

15 

3. The method as claimed in claim 1, wherein the predetermined combining function is a weighted product of or a 
non-linear operation performed upon the quantized measured predetermined set of characteristics. 

4. The method as claimed in claim 1 , wherein one of the quantized predetermined characteristics is one of consist- 
20 ency, disability proof, ease of navigation, viewable fraction or color harmony of the document. 

5. The method as claimed in claim 1 , wherein one of the quantized predetermined characteristics is legibility, search- 
ability or locatability of objects in the document. 

25 6. The method as claimed in claim 1 , wherein one of the quantized predetermined characteristics is single window 
displayability or transmission/processing time of the document. 

7. The method as claimed in claim 4, wherein the quantized consistency of the document is realized by: 
30 (j) measuring and quantizing consistency of scan in the document; 

(ii) measuring and quantizing consistency of order in the document; 

(iii) measuring and quantizing consistency of position of objects in the document; 

35 

(iv) measuring and quantizing consistency of luminance in the document; 

(v) measuring and quantizing consistency of size of objects in the document; and 
40 (vi) measuring and quantizing consistency of style of objects in the document. 

8. A method for quantifying a measure of quality of a document, comprising: 

(a) measuring a predetermined set of characteristics of the document; 

45 

(b) quantizing the measured predetermined set of characteristics of the document; 

(c) generating a quantized aesthetics value for the document based on a predetermined aesthetics combining 
function, the predetermined aesthetics combining function combining a predetermined subset of the quantized 

50 measured predetermined set of characteristics; 

(d) generating a quantized convenience value for the document based on a predetermined convenience com- 
bining function, the predetermined convenience combining function combining a predetermined subset of the 
quantized measured predetermined set of characteristics; and 

55 

(e) generating a quantized quality value for the document based on a predetermined quality combining function, 
the predetermined quality combining function combining the generated quantized aesthetics value and the 
generated quantized convenience value. 
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A method for quantifying a measure of quality of a document, comprising: 

(a) measuring a predetermined set of characteristics of the document; 

(b) quantizing the measured predetermined set of characteristics of the document; 

(c) generating a quantized convenience value for the document based on a predetermined convenience com- 
bining function, the predetermined convenience combining function combining a predetermined subset of the 
quantized measured predetermined set of characteristics; 

(d) generating a quantized ease of use value for the document based on a predetermined ease of use com- 
bining function, the predetermined ease of use combining function combining a predetermined subset of the 
quantized measured predetermined set of characteristics; and 

(e) generating a quantized quality value for the document based on a predetermined quality combining function, 
the predetermined quality combining function combining the generated quantized convenience value and the 
generated quantized ease of use value. 
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